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This  dissertation  is  concerned  with  the  sequential  estimation  of  a  multivariate  normal 
mean  vector  and  of  a  vector  of  regression  parameters  from  a  general  linear  model  with 
multivariate  normal  error  structure.  In  both  cases  the  probabilistic  setting  is  that  of  a 
hierarchical  Bayes  model. 

We  proj)ose  approximations  to  the  optimal  sequential  Bayes  stopping  rules  associated 
with  these  problems.  These  approximations  are  asymptotically  f)ointwise  optimal  (A. P.O.)  in 
the  sense  of  Bickel  and  Yahav  (1967,  Proceedings  of  the  5th  Berkeley  Symposium  on 
Mathematical  Statistics  and  Probability,  VI,  pp.  401-413),  and  are  developed  in  that  sense. 
Using  these  A. P.O.  stopping  rules  in  conjunction  with  several  natural  estimators,  we  establish 
second  order  asymptotic  Bayes  risk  expansions  for  the  various  estimation  procedures.  In 
addition  we  obtain  second  order  risk  expansions  for  the  Bayes  risks  of  the  optimal  procedures. 
These  provide  a  standard  for  comparison  of  the  performances  of  the  approximate  procedures 
in  both  the  vector  of  means  and  vector  of  regression  parameters  cases.  We  find  the  A. P.O. 
stopping  rules,  in  conjunction  with  estimation  by  posterior  mean  vectors,  yield  asymptotically 


VI 


"nondeficient"  procedures  in  the  sense  of  Woodroofe  (1981,  Zietschrift  fiir 
Wahrscheinlichkeitstheorie  und  Verwandte  Gebiete,  pp.  331-341).  We  note  the  nature  of  the 
deficiencies  associated  with  the  other  approximate  procedures. 
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CHAPTER   1 
INTRODUCTION 


1.1  Sequential  Estimation 
It  is  well-known  that  in  sequential  problems,  once  a  Bayes  rule  tells  one  to  stop,  the 
Bayes  action  is  independent  of  the  stopping  rule,  and  is  the  same  as  its  fixed  sample 
counterpart.  However,  the  problem  of  determining  Bayes  stopping  rules  is  usually  more 
formidable,  and  although  such  stopping  rules  exist  under  fairly  general  conditions  (see,  e.g.. 
Theorems  4.4  and  4.5  of  Chow,  Robbins  and  Siegmund,  1970),  their  exact  determination  is 
usually  very  difTicult.  Bickel  and  Yahav  (1967,  1968,  1969a,  1969b)  in  a  series  of  articles 
have  developed  stopping  rules  which  are  asymptotically  equivalent  to  Bayes  stopping  rules  as 
c  (the  cost  per  unit  sample)  goes  to  zero.  They  have  called  these  rules  "asymptotically 
pointwise  optimal"  (A. P.O.)  rules.  These  authors  have  proved  certain  asymptotic  optimality 
properties  of  the  A. P.O.  rules  in  sequential  estimation  and  hypwthesis  testing  for  a  general 
class  of  distributions  including  but  not  limited  to  the  one-parameter  exponential  family.  In 
the  context  of  sequential  estimation,  stronger  asymptotic  optimality  results  are  proved  by 
Woodroofe  (1981)  for  the  one-parameter  exponential  family.  In  particular,  he  has  shown  that 
A. P.O.  rules  are  asymptotically  "nondeficient,"  i.e.,  the  difference  between  the  Bayes  risks  of 
a  Bayes  estimator  under  the  optimal  Bayes  rule  and  the  A. P.O.  rule  is  o(c)  as  c— +0.  A 
similar  result  is  proved  by  Woodroofe  for  estimation  of  the  univariate  normal  mean  with 
unknown  variance.  Woodroofe's  results  are  extended  by  Rehalia  (1984)  without  the 
assumption  of  conjugate  priors.  Recently,  Finster  (1987)  has  extended  Woodroofe's  results 
for  a  normal  regression  model.    All  of  these  results  establish  nondeficiency  via  second-order 


asymptotic  expansions  of  the  Bayes  risks  of  the  various  procedures.     Our  results  continue  in 
this  vein. 

We  can  illustrate  the  concepts  discussed  so  far  in  an  example  which  foreshadows  the 
results  of  the  dissertation.  Suppose  that  given  G  =  ^  and  R  =  r,  X,,  X^,.--  are  iid  N((9,  r  ). 
Also,  suppose  that  given  R  =  r,  0  ~  Nfm,  (Ar)  j.  Finally,  suppose  that  R  ~ 
Gammafla,  ibj.  That  is,  the  X-'s  have  a  conjugate  prior  distribution.  Suppose  we  want  to 
estimate  6  sequentially  where  our  loss  function  is  squared  error  plus  linear  cost.  A  Bayes 
sequential  estimation  procedure  exists  under  this  scenario,  but  it  is  not  immediately 
accessible.  As  an  alternative,  an  A. P.O.  stopping  rule  could  be  determined  as  follows.  The 
posterior  risk  for  this  problem,  using  the  posterior  mean  as  estimator,  say  6^,  is  given  by 


(e-^n) 


X,,...,  Xji 


+  nc  =  E 


-    ,2 


E((e-^n)   M,  R,  Xp...,  Xn) 


X-,,...,  Xn 


+  nc 


=  E 


((A+n)R) 


-1 


Xi,...,  Xji 


+  nc 


(A+n)    ^E(R-l|Xp...,  Xn)  +  nc. 


Asymptotically  pointwise  optimal  stopping  rules  derived  from  the  fact  that  for  large  n, 
E(R"  |X,,...,  Xn)  «  R~  ,  and  hence  that  the  posterior  risk  behaves  as  an  essentially 
deterministic  function  with  a  computable  minimum  with  respect  to  n.    Differentiation  yields 


,-1 


-1 


that  (A+n)      R       +  nc  is  minimized  at  n  such  that  R       =  (A+n)  c.    A  stopping  rule  based 

-1  2    " 

on  this  idea  could  be  "stop  at  the  first  n  such  that  E(R     IX,,...,  X^)  <  (A+n)  c.      It  turns 

out  that  this  rule  possesses  asymptotic  pointwise  optimality  in  the  following  sense:    suppose 

T  =  T(c)  is  the  rule  described  and  S  =  S(c)  is  any  other  stopping  rule,  then 


lim 
c-*0 


E 


(e-^T) 

"(B-^g)^ 


Xp.  .  .,    Xrp 

X-i,.  ..,   Xn 


+  Tc 


+  Sc 


<    1    a.s. 


(^-^n)^ 


X-,,...,  Xn 


as  n— +00,  and 


This  property  of  T  follows  because  of  the  stability  of  nE 
is  proved  in  Bickel  and  Yahav  (1967).  As  Woodroofe  points  out  in  his  1981  paper,  an 
implication  of  being  A. P.O.  is  that  such  a  rule  must  be  asymptotically  equivalent  to  the 
Bayes  stopping  rule.  It  also  follows  from  their  arguments  that  any  rule  asymptotically 
equivalent  to  T  is  also  A. P.O.  This  is  a  weakness  of  the  property  and  led  Woodroofe  to 
consider  nondeficiency  as  a  more  refined  approximation. 

1.2    Hierarchical   and    Empirical   Bayes   Models 
This  dissertation  concerns  development  of  A. P.O.  rules  for  hierarchical  and  empirical 
Bayes  models.    The  idea  behind  a  hierarchical  Bayes  model  is  that  it  is  often  convenient  to 

model  the  subjective  prior  information  in  stages.     In  an  empirical  Bayes  scenario,  one  can 

T 
exploit  relationships  among  the  coordinates  of  a  parameter  vector,  say  9  —  (^p...,  ^p)      by 

first  putting  a  prior  distribution  on  6,  and  then  estimating  the  prior  parameters  (usually 

smaller  in  number  than  p)  from  the  joint  marginal  distribution  of  the  observations.     For 

example,  in  a  field  trial  scenario,  ^1,...,  ^p  might  be  mean  yields  associated  with  p  randomly 

selected  plots.      The  5's  can  be  thought  of  as  separate  realizations  of  a  common  random 

variable  with  mean  /i.    In  this  way,  for  drawing  inference  about  a  particular  coordinate,  say 

B-,  information  from  other  coordinates  is  also  used. 

We  also  illustrate  hierarchical  and  empirical  Bayes  models  by  example.    Consider  the 

following  hierarchical   Bayes  model   (Lindley  and   Smith,    1972).      Suppose  Xp...,  Xp  are 

independent  with  distributions  N(^j,  1)  (i  =  1,...,  p),  given  the  ^j's.    Also,  suppose  ^p...,  ^p 

are  iid,  N(^,  1).     The  hierarchical  Bayes  approeich  takes  /i  as  unknown  and  places  a  prior 

distribution  on  it.    We  take  the  improper  prior,  /i  ~  uniform(-oo,  00).   The  end  result  of  this 

model  is  that  the  joint  posterior  distribution  of  ^1,...,  ^p,  given  X-  =  x-   (i  =  1,...,  p),  is 

T  T 

dependent.      Specifically   6   =   (^p.-i   ^p)    >   given   X    =  x   =   (x^...,  Xp)    ,  is  distributed 

P  T 

Np(ix  +  ixlp,  lip  +  ^Iplp),  where  x  =  P~\E  Xj  and  Ip  =  (1,...,  1)    .    For  an  example 

of   the    empirical    Bayes    approach,    suppose    we    have,    as    before,    X-'s    ~    N(^-,    1)    and 


^'s  ~  N(/i,  1),  iid.    Under  this  model,  marginally  X,,...,  Xp  are  iid  N(;j,  2).    The  unknown, 

-1   P 
/i,  is  estimated  from  the  data.    We  use  x  =  p      2^  x- .    This  gives  us  an  estimated  posterior 

i=l  ' 
distribution  for  6.    Specifically,  given  X  =  x,  0  ~  Npf  ix  +  ^xlp,  ^Ipj-    Note  that  in  the 

two  examples  we  have  a  common  posterior  mean,  but  the  empirical  model  does  not  adjust  for 

the   error   associated   with   estimating    fi.      The   implication    is   that   the   hierarchical    Bayes 

approach  can  be  used  effectively  in  an  empirical  Bayes  set  up,  typically  by  putting  a  suitable 

prior  distribution  on  9  at  the  first  stage,  and  then  putting  another  distribution  on  the  prior 

parameters  (usually  referred  to  as  hyperparameters)  at  the  second  stage.    The  second  stage 

prior  is  intended  to  model  the  uncertainty  in  our  knowledge  of  the  hyperparameters. 

1.3     Subject   of  Research 

The  subject  of  this  research  is  the  p>erformance  of  A. P.O.  stopping  rules  in  the 
context  of  estimating  both  a  vector  of  population  means  and  a  vector  of  regression  parameters 
under  hierarchical  Bayes  models. 

In  Chapter  2,  we  develop  an  A. P.O.  rule  for  the  estimation  of  a  vector  of  means 
under  a  hierarchical  Bayes  model  originally  introduced  by  Lindley  and  Smith  (1972).  Our 
A.P.O.  rule  remains  asymptotically  nondeficient  for  a  wide  class  of  proper  (bonafide 
distribution)  priors  on  the  hyperparameters.  Further,  we  consider  the  performances  of  two 
other  procedures  in  the  context  of  the  hierarchical  model.  The  first  is  derived  from 
calculations  based  on  placing  a  diffuse  or  improper  prior  on  the  hyperparameters.  This  yields 
the  same  stopping  rule  but  a  different  estimator  for  6.  The  second  retains  the  original 
stopping  rule  and  uses  the  sample  mean  vector  to  estimate  6.  We  determine  asymptotic  risk 
expansions  for  all  three  procedures,  and  provide  performance  comparisons. 

There  is  one  article  to  date  jiertinent  to  the  discussion  of  Chapter  2;  that  is,  the 
estimation  of  means  under  hierarchical  Bayes  models,  via  A.P.O.  stopping  rules.  Martinsek 
(1987)  develops  A.P.O.  stopping  rules  under  empirical  Bayes  models  and  applies  them  to 


establish    nondeficiency    of    a    procedure    for    estimating    a    single    normal    mean.        The 
corresponding  features  of  our  results  are  discussed  in  Chapter  2. 

In  Chapter  3,  we  develop  an  A. P.O.  rule  for  the  estimation  of  a  vector  of  regression 
parameters  from  a  general  linear  model  with  a  hierarchical  prior  distribution  structure. 
Again,  our  A. P.O.  rule  remains  asymptotically  nondeficient  for  a  wide  class  of  priors  on  the 
hyperparameters.  We  again  consider  the  performances  of  two  other  procedures,  under  the 
propter  hierarchical  model:  a  diffuse  prior  procedure  and  a  "classical"  procedure  which  uses 
the  weighted  least  squares  estimate  of  /?,  the  vector  of  regression  parameters.  Both  of  these 
use  the  original  A. P.O.  stopping  rule.  We  determine  asymptotic  risk  expansions  for  all  three 
procedures,  and  provide  performance  comparisons.  One  should  note  that  the  regression  model 
contains  the  vector  of  means  model  as  a  special  case. 

There  is  also  one  paper  relevant  to  the  discussion  of  Chapter  3.  Finster  (1987) 
considers  a  single-stage  conjugate  prior  set-up  for  estimating  /?  under  the  same  regression 
model  that  we  use.  He  shows  that  the  classical  one-step  look  ahead,  or  myopic  stopping  rule 
is  A.P.O.  and,  in  fact,  nondeficient  for  estimating  /?,  under  a  loss  structure  different  from 
ours.  The  different  loss  and  lack  of  hyperparameters  make  direct  comparison  tenuous,  but 
Finster  obtains  a  risk  expansion  which  matches  those  obtained  in  Chapter  3  in  their  first 
order  terms  and  in  two  second  order  terms.   This  correspondence  is  noted  in  Section  3.4. 

1.4     Some   Tools 

We  will  be  considering  vector  parameters  associated  with  vector  observations,  and 

will   be  performing  numerous  matrix  computations.      Also,  we  will  operate  on  numerous 

martingales  and  submartingales.    At  this  point  it  is  convenient  to  establish  two  lemmas.    The 

first   lemma  is  very   useful   in  establishing  inequalities  and   will   typically   be  used   without 

comment.    Recall,  for  concreteness,  that  the  spectral  norm  of  a  real,  square  matrix  A  is  given 

1 
by  {largest  eigenvalue  of  a'^A}^    It  has  the  defining  representation,  |A|  =    sup  |Az|,  where 

z  is  a  vector,  and  the  r.h.s.  norms  are  euclidean  vector  norms. 


Lemma    1.1:     Suppose  we  have  two  sequences  of  positive  definite  matrices,  JAn,  n>lj  and 
•|  Bn,  n>l  >,  such  that 

An  — >  A,  positive  definite 
and 

Bjr,  — f  B,  positive  definite, 

where  the  convergence  is  elementwise.    Then,  for  any  sequence  of  vectors,  Xn,  there  exists  a 
positive  constant  k  such  that 


Xn  AnXn  <  kxjBnXn     for  all  n. 


Proof:   The  case  of  a  null  vector  is  immediate,  so  we  assume  Xn  7^  Q)  n  >  1- 

Since  elementwise  convergence  is  equivalent  to  convergence  in,  for  example,  spectral 

norm,  easy  matrix  norm  properties  can  be  used  to  show  that  Bn  An  — ^  B     A,  which  has 

1  1 

(—1  5     —  1     5  I 

the  eigenvalues  of  B      A  are  the  same  as  the  eigenvalues  of  A  B     A    1. 

It  follows  that  the  supremum  over  n  of  the  largest  eigenvalue  of  B^  An  is  finite.    We  take 

this  number  as  k.     The  result  follows  from  the  fact  that  for  any  pair  of  positive  definite 

matrices,  C  and  D, 


x^cx 

sup    — fp <  A, 

Xt^OX^DX 


where  A  is  the  largest  eigenvalue  of  D      C. 

The  second  lemma  is  a  collection  of  basic  martingale  results,  which  will  be  used 
without  reference. 


Lemma    1.2:    Supfxjse  -jSn,  F^,  n  >  1  i-  is  a  submartingale. 

a)  If  Gn  is  a  sequence  of  a-fields  such  that  Gj,  C  Fn,  n  >  1  and  Sn  is  G^ 
measurable,  then  <Sji,  Gn,  n  >  1  [■  is  a  submartingale. 

b)  If  4>  is  any  real  nondecreasing  convex  function  with  E(?i(Sfi)|  <  oo,  then 
<<^(Sn),  Fji,  n  >  1  [  is  a  submartingale.  If  Sn  is  a  martingale,  then  <f>  need  only 
be  convex. 

c)  If  <Sn,  n  >  1>  are  uniformly  integrable,  then  Sqo  =  I'lm  Sn  exists  almost  surely 
and  •|Sn,  Fn,  1  <  n  <  oo  ^  is  a  submartingale,  where  Fqo  =  "'(U^^Fn)- 

d)  If  T  is  a  finite  stopping  time  with 

IE  Srpl  <  oo,     lim     /         ISnl  =  0, 
I        -"^  I  n-+ooj 

[T>n] 

then 

E(St)  >  E(Si). 

If  Sn  is  a  martingale,  then  there  is  equality  in  the  expression  above. 
Proof:    See  Chow  and  Teicher  (1978),  Chapter  Seven. 


CHAPTER  2 

ASYMPTOTICALLY   FOINTWISE   OPTIMAL   STOPPING    RULES 

FOR  THE   ESTIMATION   OF   A   VECTOR   OF 

MEANS   UNDER  A   HIERARCHICAL   BAYES   MODEL 


2.1  Introduction 
In  this  chapter  we  consider  the  sequential  estimation  of  a  multivariate  normal  mean, 
0,  under  a  hierarchical  Bayes  model.  Section  2.2  describes  the  model  and  develops  the 
A. P.O.  stopping  rule,  under  a  pure  sequential  sampling  scheme.  With  these  in  place,  we  give, 
in  Theorem  2.1,  an  asymptotic  risk  expansion  for  our  sequential  procedure  (which  is  to  stop 
according  to  the  stopping  rule,  and  estimate  using  the  posterior  mean).  Next,  in  Theorem 
2.2,  we  establish  that  this  procedure  is  asymptotically  nondeficient,  by  exhibiting  an 
expansion  for  the  sequential  Bayes  procedure.  In  Section  2.3  we  look  at  two  procedures  that 
use  the  same  stopping  rule  as  before,  but  which  estimate  0  differently.  The  first  estimates  0 
via  a  posterior  mean  from  a  model  with  an  improper  prior  on  the  hyperparameters.  The 
second  estimates  0  by  the  sample  mean  vector,  which  arises  as  a  limit  of  the  posterior  mean. 
These  two  procedures  are  evaluated  under  the  original  hierarchical  model,  with  the  results 
given  in  Theorems  2.3  and  2.4.  In  Section  2.4  we  compare  the  performances  of  our 
procedures. 

2.2     A. P.O.    Rule    Under   a    Hierarchical    Bayes    Model 
Consider  the  following  hierarchical  Bayes  model.     Suppose  that  conditional  on  0  = 
6  =  (^p...,  ^p)^  (p  >  2)  and  R  =  r,  Xp  X2,...  are  iid  Np(^,  1"^^),  where  E  is  a  known 


positive  definite  (p.d.)  matrix.  Suppose  also  that  conditional  on  M  =  m  and  R  =  r, 
6  ~  Np(mlp,(Ar)~  p  j,  where  Ip  is  a  f)-component  column  vector  with  all  elements  equal  to 
1,  A  (>  0)  is  known,  and  D  is  a  known  p.d.  matrix.  It  is  assumed  that  marginally  M  and  R 
are   independently   distributed   with   M   having  a  proper   pdf  g(m)   on   (-oo,    oo)   such   that 

00 

/       m  g(m)dm  <  oo,  while  R  ~  Gamma  (  ia,  4b  j  with  a  >  0  and  b  >  0.    In  the  above 

-oo 

and  in  what  follows,  we  say  that  Z  ~  Gamma  (a,  ^5),  a  >  0,  /?  >  0  when  Z  has  pdf 


f(z)  =  exp(-oz)z^    ^a^/r(/?),       z  >  0. 


In  order  to  motivate  the  A. P.O.  rule,  first  we  need  to  find  the  posterior  distribution  of  6 
given  Xj  =  x-  (i  =  1,...,  n).   Note  that  the  joint  pdf  of  X,,...,  Xn,  6,  M  and  R  is  given  by 

ffep  •••-  ?n>  i,  m,  r) 

1 

j[-iAr(^  -  mlp)Tp-l(^  -  mlp)] 


1, 


jnp 
oc  r       exp 


i=l 


5P       , 
r     exp 


X  e'''^')^~\{m).  (2.2.1) 


Next  write 


i=l 


=  E  (Xi-xn)'^?-^(x.  -  Xn)  +  n(xn  -  ef^-^Zn  ~  ?),  (2-2.2) 

i=l 


-1  n 


where  x^  =  n      E  x--   Also,  it  follows  after  some  simplifications  that 
i=f' 
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rp  rp 

n(xn  -  i)    ?~^(xn  "  ?)  +  Hi  "  mlp)    V'^ii  '  mlp) 


|_?  -  (n?" 


^  +  AD"^)    \nE-lxn  +  Amp"Hp)J    (n?~-^  +  ^P    ^ 


(nE~^  +  AD"^)     (nE^^Xn  +  AmD^hp) 


+  inx^E"^Xn  +  Am^ljp    Ijp 


rp  1  I 

(nE-^Xn  +  AmD^hp)    (nE"^  +  Ap-^)'  (nE'^Xn  +  Amp-hp)  I.  (2.2.3) 


Next,  using  Exercise  2.9  (p.  33)  of  Rao  (1973),  one  gets 


(ni;~^  +  AD"^)    ^  =  n"^S  -  n~'^E{n~^T,  +  X'^D)     E; 


(2.2.4) 


-1t^        \-2n/„-lv  j^   \-"l 


-1 


=  A-^p  -  A-^p(n-^E  +  A'-'P)     P; 


(2.2.5) 


=  (nA)     P(n"^E  +  A"^P)     E 


(2.2.6) 


Using  (2.2.4)  -  (2.2.6),  one  gets,  after  some  simplications. 


"the  term  within  braces  in  the  right  hand  side  of  (2.2.3)" 


=  (xn  -  mlp)'^(n-lE  +  A^^D)    \xn  -  mlp).  (2.2.7) 


Now,  writing 


T 
«nl=.4(5?i-^n)    S-^Xj-Xn) 


and 


i=l 


T  -1 

s^2('^)  =  (?n  -  mlp)    (n~^E  +  A"^PJ     ^Xn  -  mlpj, 
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(and    letting    S-.    and    89    denote    the   corresponding    random    variables)    one    gets   from 
(2.2.1)-(2.2.3)  and  (2.2.7), 

f(xiv,  Xn,  ^,  m  ,  r) 


JP 


-1  T 

exp -ir{^  -  (nS"^Ap"M      (ni;"^Xn  +  AmP"hp)}    (n?"^+AP~^) 


-1 


X  U  -  (n5"^+Ap"^^     fnS'^Xn+AmD'hpn 


X  r 


i(np+b)-l 


exp 


iK^nl+^n2('")+^) 


g(m)- 


(2.2.8) 


Formula    (2.2.8)    has    several    important    consequences.       First,    conditional    on    X-    =    x- 
(i=l,...,n),  M  =  m,  and  R  =  r 


G   ~  Nr 


('nE~^Ap~^)     (nS'^Xn  +  AmD'hpY  r'VnE"^  +  AD"^) 


(2.2.9) 


Second,  the  joint  marginal  pdf  of  X,,  . . .,  X^,  M,  and  R  is  given  by 


f^xp  ...,  Xn,  m,  rj  a  expl -lr^Sjj^+Sjj2(m)+ajp  g(m). 


(2.2.10) 


Hence,  conditional  on  X-  =  x.  (i  =  1,  ...,  n)  and  M  =  m. 


R  ~  Gamma^i(Sjjj+Sj^2('")+^)'  ^(np+b)]. 


(2.2.11) 


Finally,  the  joint  marginal  of  X,,  ...,  X^,  and  M  is  given  by 


-i(np+b) 


Xj,  ...,  Xn,  mj  oc  [Sjjj+Sjj2('")+^J  6('") 


(2.2.12) 
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Consider  now  the  problem  of  sequentially  estimating  6  by  an  estimate  e  under 
squared  error  loss,  and  cost  c  ( >  0)  per  unit  sample.  Let  Aq  denote  the  trivial  tr-algebra  and 
An  the  cr-algebra  generated  by  Xp...,  Xn  (n  >  1).  The  Bayes  sequential  decision  problem  is 
to  find  a  stopping  time  r  and  an  A7-measurable  function  e,-  =  Cy-fXp...,  X^-j  for  which 
eI  ||0  -  §rll^  +  Cm  is  minimized.  For  every  stopping  time  r,  the  Bayes  risk  is  minimized 
by  §1-  =  e(0|A,-]  =  9r,  say,  where  it  follows  from  (2.2.9)  that  for  every  n  >  1, 

^n  =  E(eiAn)=(nE-^AD-l)     (nS-^Xn+AE(MlAn)p-hp).  (2.2.13) 


Also 


Bq  =  E(e|Ao)  =  e(m|Ao)1p  =  (EM)lp. 


Next,  observe  that  if  b>2  in  the  Gamma  prior  of  R, 


E[||e-^nll^|An]  =  tr{v(e|An)} 

=   tr|E[V(e|An,  M,  R)|An]+  V[E(eiAn,  M,  R)|An] 


;[R-l(n?-^Ap-l)    ^|AnJ+v[(ni;-^Ap-l)    ^nE'^Xn+AMD    hp)|An 


=  tr|nE"^Ap^l}       E[R~^|An]  +  trJ  (nS'^AD    ^j 


T  -n 

X  (Ap-hp)(AP"hp)    (nE"^Ap-l)      W[M|An]. 


(2.2.14) 


Write  Fn  =  fs'^^An^^p"^)      and  G  =  P    ^plp  P     • 


Note  that 


r      1  1      S^i+Sn9(M)+a 

E[R-l|An,M]=    -\f^_^ 


(2.2.15) 
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It  follows  now  from  (2.2.14)  and  (2.2.15)  that 


!e-^nir|An 


n-^tr(Fn){s^l+E[S^2(^)l^n]+a} 


(np+b-2) 


+  n-2A2tr(FnGFn)V[M|An] 


=  n-^Un  +  W;^  (say), 


(2.2.16) 


where 


Un 


_tr(?)S^_ 
np+b-2' 


(2.2.17) 


^_n-Hr(Fn-S)S^l      n-^tr(Fn){E[S^^(M)|An]+a} 
^°  ~  np+b-2  "^  np+b-2 


+ 


n-2A2tr(F„GFn)v[MlAn]. 


(2.2.18) 


We  shall  now  show  that  n     Un  is  the  dominating  term  in  (2.2.16).    To  see  this,  first 

observe  that  using  the  strong  law  of  large  numbers  and  regularity  of  conditional  probability 

RS 

measures,  r^— ^1  a.s.  (P)  as  n -too,  where  P  denotes  the  probability  measure  on  (fi,  "iF), 

np+b-2  ^    ' 

the  basic  probability  space  on  which  all  the  random  variables  are  defined.  It  remains  to  show 
that  n  times  Wn  converges  to  zero  in  probability  (P)  as  n-+oo.  Since  Fn  =  S+0(n  ),  n 
times  the  first  term  in  the  right  hand  side  of  (2.2.18)  converges  to  zero  a.s.  (P)  as  n-+oo. 
Since  E(M^)  <  oo,  {E[M^|An],  n>lj-  and  |E[M|An],  n>lj  are  uniformly  integrable 
martingales.  Hence  V[M|An]  -*  V[M|Aoo]  a.s.  (P),  where  Aqo  =  (^(Xp  X2,..  .j.  Also, 
VrM|Aoo]<oo  a.s.   (P).     Thus,  n  times  the  last  term  in  the  right  hand  side  of  (2.2.18) 
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converges  to  zero  a.s.  as  n— ►oo.     Finally,  since  E(S  9(^1))  =  E-|  R""  EPRS  r,(M)|R,M  |^  = 
E'JR"  Pf  =  7-91  it  follows  that  eFS  2(M)|An|  =  Op(l)-    Hence  n  times  the  second  term  in 
the  right  hand  side  of  (2.2.18)  converges  to  zero  in  probability.   Thus,  n  Wn— *0  as  n-*oo. 
Next,  notice  that  using  (2.2.16), 

E  J|e  -  OrW  +crj  =  E[r"^UT-  +  W:;i  +  cr].  (2.2.19) 

Explicit  determination  of  r  minimizing  the  right  hand  side  of  (2.2.19)  is  quite  formidable.  As 
a  good  approximation  to  the  Bayes  stopping  rule,  we  neglect  the  middle  term,  W7-,  and 
follow  Bickel  and  Yahav  (1967,  1968)  to  define  the  A. P.O.  rule 

T  =  Tc  =  inf|n>n(j:  Un<cn21.  (2.2.20) 

At  this  point,  it  suffices  to  take  Hq  =  1.    For  later  results,  we  need  bigger  Uq. 

Remark  2.1.  It  is  important  to  observe  that  the  A. P.O.  rule  given  in  (2.2.20)  remains  the 
same  irrespective  of  the  prior  distribution  of  M.  This  is  because  all  the  terms  involving  M  in 
(2.2.16)  have  negligible  contributions  in  comparison  with  the  term  n  Un-  Later,  in  this 
section  we  provide  asymptotic  Bayes  risk  expansions  for  Or^  as  well  as  for  6j^,  where  N 

denotes  the  optimal  (Bayes)  stopping  rule.    The  Bayes  risks  of  both  Or^  and  ^»t  turn  out  to 

1 
be  of  the  form  w,c^  +  w^c  +  o(c).    The  coefficients  Wi  and  w^  for  both  ^j^t  and  Or^  agree. 

Hence,  the  A. P.O.  rule,  T,  is  asymptotically  nondeficient  in  the  sense  of  Woodroofe  (1981). 

Before  proving  the  asymptotic  nondeficiency  of  the  proposed  A. P.O.  rule,  we  need  a  few 

preliminary  results,  which  we  state  in  the  form  of  two  lemmas. 
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Lemma  2.1:    Suppose  b>2.    Then 


(i)         Ury  -t  tr(i:)R"^  a.s.  (P)  as  c^O; 


-h 


(ii)       E(U-j)  -^  tr(i:)E(R~')  as  c-^0; 


(iii)      c^T  ^  {tr(S)}^R   ^  a.s.  (P)  as  c^O; 


1     ,       1. 


(iv)      E(c^T)  -^  |tr(E)pE(R  ^\  as  c-^0. 


Proof:      First  we  prove  (i).     Write,  using  the  Helmert  orthogonal  transformation,  Yj   = 

pi  +  ...  +  Xj_j  -  (i-l)XJ]/(i(i-l))^  i  =  2,  3,....    Then  one  has  S^^  =  i:f=2^'[ '^~'^Y ;■ 

—  1  T    — 1 

Note  that  conditional  on  R  =  r,  the  Yj's  are  iid  Np(0,  r    ^E),  so  that  the  Yj  ?     Y;  are  iid 

r~  Xp-    Hence,  using  the  strong  law  of  large  numbers,  Un  — *  r~  tr(E)  a.s.  with  respect  to 

the   conditional    probability   measure,   as   n— *oo.      The   conditional   measure   being   regular, 

Un  —>■  R~Hr(i;)  a.s.  (P)  as  n— ^oo.    In  addition,  since  T— ^oo  a.s.  (P)  as  c— ►oo,  one  gets  part 

(i)  of  the  lemma. 

To  prove  part  (ii),  first  observe  that,  conditional  on  R  =  r  (>0),  Sjjj/(n-l)p  is  the 

average  of  (n-1)  iid  random  variables  each  with  first  moment  r~  .    An  application  of  Doob's 


maximal   inequality  for  bax;kward  martingales  now  provides  E 


sup  Sj^j/(n-l)p|R 


=  r 


n>no 


kr       a.e.  for  Uq  >  2,  where  k  (>0)  is  a  generic  constant  which  may  depend  on  Uq  and  p,  but 
not  on  r.     Now  since  ELR"  J  <  oo,  one  gets  E    sup  Sj^-^/(n-l)p 


implies  immediately  that  E    sup  Un 

n>nn 


n>no 


<  c»  for  Uq  >  2.    This 


<  oo  for  Uq  >  2.    Using  part  (i)  of  the  lemma  and  the 
dominated  convergence  theorem,  one  gets  part  (ii)  of  the  lemma. 

Next,  to  prove  parts  (iii)  and  (iv),  one  uses  the  inequalities  Urp  <  c  T  <  c    +  U'p_2 
(defining  Un  =  0)  as  well  as  parts  (i)  and  (ii)  of  the  lemma. 
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Lemma  2.2:    Suppose  ELM^J  <  oo.    Then 


V(M|Arp)  -^  V(M|e,R)  a.s.  (P)  as  c^O. 


Proof:    First  note  that 


V(M|An)  =  E[M2|An]  -  E2[M|An] 

=  E{E[M2|An,  6,  r]  |  An}  -  E2{E[M|An,  6,  R]  1  An}  (2.2.21) 

=   E{E[M2|e,  R]|An}  -  E2{E[M|e,  R]|  An}  (2.2.22) 

_  e{e[m2|0,  r]|a^|  _  e2{e[M|0,  R]|aoo}  (2-2.23) 

=  E|E[M2|e,  r]|  -  E2|E[M|e,  R]|  =  V(M|e,  R)  a.s.  (P)  as  c-^oo.  (2.2.24) 

Equation  (2.2.22)  follows  upon  examination  of  the  joint  distribution  of  Xp...,  Xn,  M,  6,  and 
R  (see  (2.2.1));  (2.2.23)  follows  from  the  fact  that  given  e[m2]  <  oo,  V(M|An)  is  a  positive 
supermartingale  and  thus  a.s.  convergent,  and  (2.2.24)  follows  since  6  and  R  being  Aqo 
measurable,  the  a.s.  (P)  convergence  of  Xn  to  0  and  Un^  to  R  implies  that  eLm^IB,  rJ  and 
E[M|e,  R]  are  Aqo  measurable.  Now  using  T-*cx>  a.s.  (P)  as  c^O,  the  lemma  follows  from 
(2.2.24).  The  actual  expression  for  V(M|e,  R),  though  involved,  can  easily  be  computed 
from  the  joint  distribution  of  6,  M,  and  R. 

We  now  proceed  to  prove  the  first  main  theorem  of  this  section  which  provides  a 

1 
Bayes  risk  expansion  of  ^^  in  the  form  u.^c^+a>2C+o(c).    (Note:    Henceforth  all  almost  sure 

probability  statements  will  be  w.r.t.  P,  unless  otherwise  stated.) 


17 


Theorem    2.1:    K  (hq-I)?  >9,  b>2,  and  E(M'*)<  oo,  then 


|e  -  5^11  +cT 


1       1 


=  2c^(trE)^{r(l(b-l))/r(lb)}(|) 


+  c 


(2p)'^  -  Atr(?p-lE)/tr(E)  +  A2{tr(EG?)/tr(?)}  E{RV(M|e,  R)} 


+  o(c)  as  c— »0, 


(2.2.25) 


-1       T    -1 
where  we  may  recall  that  G  =  D     IplpP     . 


Proof:   Let  Wn  =  e[r   Vn]-    Using  (2.2.14)  we  write 


[lie  -  ^tII^  +  ct  =  EJEyie  -  e^liVT]  +  <^t} 


=  E[T-l(trFT)  Wt  +  T-2A2tr(FTGFT)v(M|AT)  +  ct]  (2.2.26) 


=  E   T-l(trE)  W^  +  T-l(tr(F-p)  -  tr(E))WT 


+ 


T~2A2tr(F^GFrp)v(M|A^)  +  cT 


=  E 


1  1        _i  11 

2c^(trE)^E(R  '|A^)  +  2c2(trE)' 


1  _i 

W|  -  E(R  ^\Aj) 


(2.2.27) 


+  T" 


(tr?)^  W^  -  c^T     +  T-l[tr(FT)  -  tr(E)]WT. 


+  T-^A^trfr^GF^)  V(M|A^) 


(2.2.28) 


Note  that  in  going  from  (2.2.27)  to  (2.2.28)  we  have  applied  the  basic  algebraic  identity  in 
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Woodroofe  (1981)  to  T   ^(trE)Wrp  +  cT.    In  view  of  the  decomposition  in  (2.2.28)  we  prove 
the  theorem  by  showing  that  as  c— +0, 


E 


1  1,1 


1  1 


2c^(trE)^  E(R"^iAT)    =  2c^(trE)^    %^-'^))/Kl^)   (l^ 


1/1 


2c   ^(trE)-    W^  -  E[R   ''lA^] 


(2p)     ; 


(c-1t-1( 


r  rp     —     tX  £-(   jVv  rr\ 


iTFnr  -  trE  Wn 


-A  tr(Ep~^?)/trE; 


El  c~^T-V{trE}%|,  -  chy 


(2.2.29) 


(2.2.30) 


(2.2.31) 


Erc-^T-2A2tr(F^GF^)v(M|A^)1  -»  A2|tr(EGE)/trE}E[RV(M|e,  R)];  (2.2.32) 


(2.2.33) 


This  is  done  in  Appendix  A. 

Remark  2.2.  It  is  possible  to  modify  the  stopping  rule  given  in  (2.2.20)  and  get  the  same 
conclusion  as  given  in  Theorem  2.1.  For  example,  defining  Un  =  (trE)Sj^j/(n-l)p  (n>2), 
and  the  stopping  rule  T*  as 


T*  =  infln  >  no:  Un  <  cn^J, 


(2.2.34) 


we  get  a  Robbins-type  stopping  rule  as  proposed  by  Ghosh,  Sinha,  and  Mukhopadhyay  (1976) 
(see  also  Ghosh,  Nickerson,  and  Sen,  1987).  However,  examining  the  proof  of  Theorem  2.1,  it 
is  clear  that  Epie  -  ^  +||^  +  cT*  1  has  the  same  expansion  as  that  given  in  (2.2.25),  under 
the  same  conditions.    Alternatively  if  we  define 


T*  =  inf|n>nQ:  (trE)Wn  <  cn^j. 


(2.2.35) 
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a  natural  prior  dependent  stopping  rule,  we  again  get  the  same  expansion,  using  arguments 
similar  to  those  of  Theorem  2. 1 . 

We    shall    now    prove    that    if   N    =    Nc    denotes    the    Bayes    stopping    rule,    then 
E  11©  -  ^nII^  "^  ^^     ^^  ^^^  ^^^^  expression  as  given  in  the  r.h.s.  of  (2.2.25). 

Theorem    2.2:    If  (nQ-l)p  >  9,  b  >  2  and  E(M'*)  <  oo,  then  for  the  Bayes  stopping  rule  N, 


lie  -  ^^11^  +  cn]  =  2c^(trE)5{r(l(b-l))/r(b/2)}(a/2)^ 
+  cr(2p)"-^  -  Atr(?p-lEytr(i;)  +  A2|tr(?GE)/tr(E)}E[RV(M|e,R)]J 


+  o(c)  as  c— +0, 


(2.2.36) 


where  G  =  P"hpljp~^. 


Proof:  The  decomposition  of  Theorem  2.1,  equation  (2.2.28)  is  valid  here,  with  "N"  replacing 
"T".  It  follows  that  to  establish  this  theorem,  it  suffices  to  prove  a  set  of  asymptotic 
relationships  as  in  (2.2.29)  -  (2.2.33): 


E 


1  1,1 


1  1 


2c5(tr?)^E(R"^|Aj^)    =  2c^(tri;)^    r(l(b-l))/r(lb)    (|)^ 


(2.2.37) 


_i  1/     1 

2c   ^(tr?)^   W^ 


N 


E 


R  2|A 


N 


(2p)"\ 


(2.2.38) 


E[c~%~\trFj^  -  tri;)Wp^]  -^  -Atr(i:p    ^i;)/tr(E), 


(2.2.39) 
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E[c-lN-2A2.tr(Fj^GFj^)V(M|Ap^)]  ^  A2|tr(?GE)/tr(S)}E[RV(M|e,  R)],  (2.2.40) 


11         1    ,2 


"%"V(tri;)%^  -  c%) 


0,  as  c  — *  0. 


(2.2.41) 


Here,    as    in    Theorem    2.1,    we    need    the    behavior    of    N    to    even    get    started.       Write 

.2  1 

.    Following  Bickel  and  Yahav  (1967),  it  can  be  shown  that  for  any 


Ln(c)  =  E 


e-^n      +cn 


stopping  rule  r  =  Tc,  L7-(c)  ~  infLn(c)  if  and  only  if  r^  ~  Tc  a.s.  as  c  ^  0.  N  being  the 
Bayes  rule,  we  must  have  Nc  ~  Tc  a.s.  as  c  — ►  0.  Then,  using  Lemma  2.1, 
cN^  _►  (tri;)R~-^  a.s.  as  c  -+  0  (Nc  -^  oo  a.s.  as  c  -►  0).  This  behavior  for  N  is  sufTicient  to 
establish  the  appropriate  pxjintwise  convergences  associated  with  the  integrands  of  (2.2.37)  - 
(2.2.40).  (See  the  proof  of  Theorem  2.1.)  We  will  find  that  (2.2.41)  is  forced,  given  the  other 
relationships,  and  the  fact  that  N  is  Bayes.  It  remains  to  establish  uniform  integrabilities. 
The  details  are  provided  in  Appendix  B. 


Remark  2.3.  It  follows  from  Theorems  2.1  and  2.2,  that  the  Bayes  risks  of  the  A. P.O.  rule 
T,  and  the  Bayes  rule  N,  agree  up  to  the  coefficient  of  c.  Thus  the  proposed  A. P.O.  rule  is 
asymptotically  nondeficient  in  the  sense  of  Woodroofe  (1981).  It  is  apparent  that  the  rule  T* 
of  Ghosh,  Sinha,  and  Mukhopadhay  is  also  nondeficient. 

We  now  turn  to  procedures  that  follow  from  a  variant  of  the  Section  2.2  model  that 
puts  an  improper  uniform  prior  on  M,  as  is  often  done  in  hierarchical  Bayes  analysis  (see, 
e.g.,  Lindley  and  Smith,  1972). 

2.3     A. P.O.  Rule  Under  an  Improper  Prior 

Consider  the  following  variant  of  the  hierarchical  Bayes  model  given  in  Section  2.2. 

As   before,    conditional    on    6    =    ^    and    R  =  r,    let    X^,    X2,...    be   iid    Np(^,    r"^?),    and 

conditional  on  M  =  m,  and  R  =  r,  let  6  ~  Np(mlp,  (Ar)~^p),  where  A  (>0)  is  known,  and 

E  and   D  are  known  positive  definite  matrices.     Marginally   M  and  R  are  assumed  to  be 
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independently  distributed  with  R  ~  GammaQa,  Ibj,  but  M  ~  unifornn  (-00,  00).  The 
resulting  prior  distribution  for  9  will  be  improper,  but  the  posterior  distribution  of  Q  given 
X.  =  X-  (1  <  i  <  n)  will  be  proper  for  every  n  >  1.  It  follows  that  the  Bayes  risk  of  any 
procedure  under  this  model  is  infinite.  We  shall  proceed  formally,  to  develop  our  estimator 
and  stopping  rule,  and  then  evaluate  their  performance  in  the  context  of  the  model  of  Section 
2.2. 

In  order  to  motivate  the  A. P.O.  rule  we  must  first  find  the  formal  pxjsterior  distribution 
of  e  given  X-  =  x.  (1  <  i  <  n).  Note  that  the  joint  pdf  of  Xp  Xj,...,  Xr,  6,  M  and  R  is 
given  by 

%!-  •••.  Jn,  §,  m,  r) 


inp 
oc  r       exp 


-ht  (xj  -  e)^-L-\x.  -  6) 


X  r     exp 


-\\T{e  -  mlp)'^p-l(^  -  mlp)]exp(-lary''   \  (2.3.1) 


Writing  C  =  P~^  -    (lpP~hp)     P~hpljp    ^  it  follows  that 


(e-mlp)    p-^(0-mlp) 

=  (ljp-llp)  m-(lTp-llp)     (lJp-1^) 


+  e^ce. 


(2.3.2) 


Note  that  C  is  singular  since  Clp  =  0.    Also  C  is  nonnegative  definite  (n.n.d.)  since  for  every 
pxl  vector  a, 


nr    —  1      2 

a    \^a  ^—  a    u     a  np      < 

'     "       -     -      ~         Ijip-^lp 


>  0, 
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using  the  Schwarz  inequality.    Thus  0  has  an  improper  prior  distribution,  since  the  prior  pdf 
is  proportional  to  expf  -^Ar^    C^  j. 

Integrating  the  joint  pdf  given  in  (2.3.1)  with  respect  to  m,  and  using  (2.3.2),  one 
finds  the  joint  pdf  of  X  p...,  Xn,  6,  and  R  given  by 


f(x^,...,  Xn,  9,  r)  oc  r^^^exp  -IrJ  (xj  -  6)    ^   %  -  6) 


X  r 


^(p-1) 


expr-iAr(e'^Ce)]  exp(-lar)i 


jb-1 


(2.3.3) 


Using  (2.2.2)  and  writing  Qn  =  nS       +  AC,  it  follows  after  some  algebra  that 


Tv.-1,„.   ^^.Tv-l. 


(xn  -  O)    ?"-^(?n  -i)  +  >^i^Qi  =  i    Qui-  n    ?~Hnxn)  +  M  ?    Hn-  (2-3.4) 


Since  E       is  p.d.  and  C  n.n.d.,  Qn  is  p.d.  for  every  n  >  1.    Hence 


1 
r.h.s.  of  (2.3.4)  =  \i-  Qn^?~^(nxn)]    Qnfc  -  Qn^?"^(nxn)] 


+  nxJ(E-l  -  nE-^Qn^E   ^)xn. 


(2.3.5) 


We  write  s^^^  =  EjLi(Xi-Xn)  ?"^(Xi-Xn)  and  s*2  =  nxJ(E  ^  -  nE  ^Qn^?  ^)xn,  and 
denote  by  S  i  and  S*^  the  correspxjnding  random  variables.  Then  from  (2.3.3)  -  (2.3.5)  one 
gets 


23 


f(xp...,  Xn,  e,  rj 


oc  r       exp 


1 
'{»  -  Q;'S"'(n!n)}    9n{«  "  9n'§"'(n!i.)}_ 


.,^<'-')exp[4r(s„,+<,  +  .)p''-'. 


(2.3.6) 


Formula  (2.3.6)  leads  to  two  important  conclusions.  First,  conditional  on  Xj  =  Xj 
(i=l,...,n)  and  R=r,  e~Np(^n,  r'^Qn^).  where  ^n  =  Qn^?~\iiXn).  Note  that  d^  does 
not  depend  on  r.   Second,  observe  that  the  joint  marginal  pdf  of  X^,.  •  •>  ^n  and  R  is  given  by 


i(np+b-3)       r  1   /  *  \1 

f(xp...,  Xn,  r)  a  r'  «xPL-f  (^nl  +  ^02  +  ^)J- 


(2.3.7) 


It    follows    from    (2.3.7)    that    the    conditional    pdf   of   R    given    Xj    =    Xj    (i=l,...,n)    is 
Gamma  (\{s^-^  +  s*2  +  a),  ^(np+b-l)Y    Thus,  formally, 

E(||e  -  ^nlPlAn)  =  tr(Qn^)E[R-l|An] 


=  t^(9n^){(Snl  +  5*2  +  a)/(np+b-3)}. 


(2.3.8) 


The  next  step  is  to  establish  a  stopping  rule  based  on  a  dominant  term  from  (2.3.8). 
To  obtain  such  a  dominant  term  we  need  a  valid  probability  model.  We  assume  the  actual 
observations,  X-,  i  >  1,  come  from  the  model  of  Section  2.2.  Then,  via  Lemma  2.1, 
S  ,/(np+b--3)  —  R"^  a.s.  as  n-^oo.  Also,  writing  C  =  Ss"^,  and  applying  Exercise  2.9,  p. 
33  of  Rao  (1973)  again,  one  can  write 


S*2  =  AxJs(An-ls'^ES  +  l)     s'^Xn  <  KX^E^^Xn. 


(2.3.9) 
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Thus  S*2  is  Op(l)  since  xJs'-^Xn  is  a  backward  submartingale.  Finally,  tr^Qn  j  ~ 
trfn""^?].  We  have  as  a  dominant  term  n'-^(trE)^Sjj^/(np+b-3)j.  But  this  is  essentially 
the  same  as  n"~-^Un  from  Section  2.2,  and  so  we  propose  the  same  stopping  rule  T  as  given  in 
(2.2.20).    Thus,  our  procedure  is  to  stop  at  time  T,  and  then  estimate  6  by  f  j. 

Remark  2.4.  In  the  special  case  when  S  =  D  =  Ip,  the  above  procedure  can  be  compared  to 
a  recent  A. P.O.  rule  of  Martinsek  (1987).  To  see  this,  first  note  that  when  E  =  D  =  Ip, 
using  the  formula  (see  Rao,  1973,  p.  33  Exercise  2.8) 


(A+uvT)    ^  =  A-^  -(l+v^A-^u)    \a-1uvTa-1), 


T 
where  both  A  and  A  +  uv     are  invertible,  one  gets 


(nE-1  +  Ac)      =  [(n+A)Ip  -  A(p)-hplj] 

=  (n+A)"^[lp  +  A(np)-hplj].  (2.3.10) 


Hence 


-1  ,-1,       .    -1 -1. 


tr(E-^An-^c)      =  n(n+A)     (p+An"!)  =  (n+A)     (np+A).  (2.3.11) 

Moreover  the  expression  for  S  n  simplifies  to 


S*9  =  "^n 


vT 


Ip  -  n{(n+A)Ip  -  Ap   hplj} 


Xn 


=  (n+A)~^A||Xnf  -  p-l(xTlp)H 
1      P   /-  =    \^ 
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(2.3.12) 


-Iv-P      V 


where  Xn  =  P~  Ejli^nj-    Using  (2.3.11)  and  (2.3.12)  one  gets 


E(lie-^nll^lAn) 


-1  -1 

=  (n+A)     (np+A)(np+b-3) 


.2    ,       .._i    -P.  .--       -=-    -2 


EllXj-Xnll  +(n+A)-^nA5:(X   .-Xn)    +a 
i=l  j=l       -^ 


(2.3.13) 


Identifying  our  A,  a,  and  b  with  Iq,  b^  and  aQ  respectively  of  Martinsek  (1987),  it  follows 
that  the  rhs  expression  given  in  (2.3.13)  is  similar  to  the  tJn  given  in  p.  133  of  Martinsek 
(1987)  when  p  =  1.  One  basic  difference  is  that  instead  of  Xn  in  (2.3.13),  Martinsek  uses  a  Y 
which  is  independent  of  the  X-'s,  and  uses  this  independence  in  his  proof.  The  other  major 
difference  is  that  Martinsek  (1987)  uses  the  entire  expression  Un  ^similar  to  the  rhs  of 
(2.3.13))  given  in  his  p.  133  in  defining  his  A.P.O.  rule,  in  the  spirit  of  T  proposed  at 
(2.2.35).  We  use  the  dominant  term  ElLlH^i  "  Xn|P/(np+b-2)  in  defining  our  rule.  We 
shall  also  show  that  our  procedure  is  not  asymptotically  more  deficient  than  that  of 
Martinsek  (1987)  because  of  the  extra  simplification  in  defining  the  A.P.O.  rule  (see  Remark 

2.5). 

We    now    develop    the    asymptotic    Bayes    risk    expansion    of    our    "noninformative" 

procedure,  say  (T,  0ry),  under  the  general  model  of  Section  2.2. 
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Theorem    2.3:       Consider  the  model  as  proposed  in  Section    2.2.      If   {nQ-l)p>9,    b>2,    and 

E(M^)<oo, 


|e  -  ^^11     +cT 


1  1 


=  2c^(trE)^{r(l(b-l))/r(lb)}(|) 


+  c 


/tr(E)J  +  o 


(2p)       -  Atr(i;C?)/tr(E)J  +  o(c)a5c^0,  (2.3.14) 


Proof:    Using  standard  Bayesian  arguments, 


ie-^^ll  +cT 


=  E 


|e-§^||  +cT 


+  E 


7rp  U  rp 


(2.3.15) 


In  view  of  Theorem  1,  it  suffices  to  show  that 


C      ^||e^    -    ?rp 


11^]  -.  Atr{E(p-l  -  C)E}/tr(E) 


-  A2|tr(EG?)/tr(?)].  E{RV(M|e,  R)}  .  (2.3.16) 


This  is  done  in  Appendix  C. 


Remark   2.5.    It  follows,  upon  comparison  of  Theorems  2.1  and  2.3,  that  the  cost  of  using  a 
"diffuse"  prior  procedure  when  M  in  fact  possesses  a  proper  distribution  (and  4  moments)  is 


crAtr{i;(p-l  -  C)?}/tr(?)  -  A2|tr(EG?)/tr(E)}  E{RV(M|e,  R)}]  +  o(c) 


(2.3.17) 


as  c— ►O.    For  purposes  of  comparison  we  note  that  when  M  is  degenerate  ^i.e.  P(M-m)  -  1 
for  some  m),   (2.3.17)   becomes,   after  some  calculations   (and   noting  that  the  terms  from 
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Theorems  2.1  and  2.3  involving  V(M|9,  R)  are  zeroes), 


:Atr(Ep-hplJp^^i:)/{(trE)(lJp-hp)}  +  o(c). 


(2.3.18) 


Further  simplifying  to  the  case  D  =  E  ==  Ip,  (2.3.18)  becomes  cAp~^  +  o(c).  This  agrees 
with  the  findings  of  Martinsek  (1987).  Thus,  we  may  conclude  that  even  for  known  M,  the 
proposed  diffuse  prior  procedure  is  first  order  efficient,  but  is  not  nondeficient  unless  p  =  Pc 
-+  oo  as  c— fO.   The  same  phenomenon  api>ears  in  Martinsek  (1987). 

Remark  2.6.  At  this  point  it  is  useful  to  consider  the  performance  of  (T,  Xrp)  relative  to  (T, 
dry)  (or  equivalently  the  T*  case  of  Ghosh  et  al.,  1976).  We  find  that  both  procedures  are 
first  order  efficient,  but  dry  has  a  larger  second  order  efficiency  than  Xrp.  This  follows  from 
the  theorem  below. 

Theorem  2.4:  Consider  the  hierarchical  model  proposed  in  Section  2.2.  If  (nQ  -  l)p  >  9, 
b  >  2,  and  E(M^)  <  oo,  then 


|e-X:j,||  +cT 


■] 


2c^(trE){r(l(b-l))/r(ib)}(|)^  +  c(2p)"^  +  o(c)  as  c-.0. 


Proof:    Standard  Bayesian  calculations  lead  to 


E 


lie  -  x^ii^  +  ctJ  =  E  lie  -  f^ii  +  cTj 


\ii  -  Xjl 


In  view  of  Theorem  2.1,  it  suffices  to  show  that 
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c-^E 


V       -    '2 

I  U  rr\     "~    J\  rp 


-^  Atr(Ep"^E) /tr(i:)  -  A2|tr(i:GS) /tr(E)}  E|RV(M|e,  R)}  as  c^O.  (2.3.19) 

This  is  done  in  App>endix  D. 

2.4     Summary   and    Comparison 
It  is  useful  at  this  point  to  collect  our  results  for  an  overall  comparison.    We  have 
three   "competing"   procedures:      (T,   Orj.),   (T,   ^rp)  and   (T,  Xrp).     They  can  be  viewed  as 
methods    based    on    decreasing    awareness    of,    or    confidence    in    the    "true"    model.      Their 
asymptotic  performances  take  the  following  forms: 


[©-^xf   +  ^^ 


=  wjc^  +  cL(2p)    ^  -  Atr(Sp-^E)/tr(S)J 


+ 


A2{tr(i;GE)/tr(E)lE[RV(Mie,  R)]  +  o(c) 


2   ,  J/o„N    1       ^...^T^-lv^^/tr(I;)] 


=  u^c^  +  c|_(2p)       -  Atr(Ep"^i;)/t 


+  A|tr(i:(p-l-C)E)/tr(i:)}E[V(M|e,  R)/V*]  +  o(c)  (2.4.1) 


where  V     — 


"(!?P-'!P) 


AR 


=  V(M|0,  R)  under  a  Uniform  (-oo,  cxd)  prior, 


E 


1^  +  cT    =  w^c^  +  c[(2p)    ^  -  Atr(ECE)/tr(?)J  +  o(c),  (2.4.2) 


©-^x     +  ^^ 


E 


e-XrpB  +  cT 


^  -1 

=  WjC^  +  c(2p)       +  o(c). 


(2.4.3) 
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A  comparison  of  the  second  expression  for  the  risk  of  (T,  9^)  and  the  expression  for  the  risk 
of  (T,  ^rp)  shows  the  effect  of  the  "lost"  information  concerning  M.  If  g(m)  expresses  a 
precise  knowledge  of  M,  then  V(M|e,  R)/V*  will  tend  to  be  close  to  zero.  If  g(m)  expresses 
vague  knowledge  of  M,  then  V(M|e,  R)/V*  will  tend  to  be  close  to  one,  in  which  case  the 
two  risk  expansions  would  coincide.  A  comparison  of  the  asymptotic  risks  of  either  (T,  9rj.) 
or  (T,  drr.)  and  (T,  Xrp)  brings  out  the  role  of  A  in  the  model.  As  a  measure  of  the  relative 
variation  in  the  X's  given  6,  R  compared  to  the  variation  in  9  given  M,  R,  we  see  that  the 
use  of  posterior  means  in  estimating  0  is  most  advantageous  when  A  is  large;  that  is,  when 
there  is  relatively  more  variation  in  the  X's  than  in  6.  Large  values  of  A  correspond  to  more 
precise  priors  on  0.    Recall  that 

Or^  =  (tE^^AD"^)     (tE'^X^p  +  AE(M|A^)p~hp), 

and  thus  that  Xrp  appears  in  the  limit  as  A  — ►  0.  Similarly,  Xj  appears  as  the  limit  of  ^rp 
as  A  -►  0. 


CHAPTER  3 
ASYMPTOTICALLY   POINTWISE   OPTIMAL   STOPPING   RULES 
FOR  THE   ESTIMATION   OF   A   VECTOR  OF 
REGRESSION   PARAMETERS   UNDER  A   HIERARCHICAL   BAYES   MODEL 


3.1     Introduction 

In  this  chapter  we  consider  the  estimation  of  a  vector  of  regression  parameters,  ^, 
from  a  generalized  linear  model  with  multivariate,  normal  error  structure,  under  a 
hierarchical  Bayes  model.  We  proceed  as  in  Chapter  2,  noting  that  the  regression  model  can 
be  considered  a  generalization  of  the  model  of  that  chapter.  Section  3.2  describes  the  model 
and  develops  the  A. P.O.  stopping  rule.  Then,  Theorem  3.1  gives  an  asymptotic  expansion  for 
the  procedure  which  uses  the  posterior  mean  estimator.  In  Theorem  3.2  we  show  that  the 
procedure  is  asymptotically  nondeficient,  again  showing  an  expansion  for  the  sequential  Bayes 
procedure.  In  Section  3.3  we  look  at  an  improper  prior  based  estimator  and  the  weighted 
least  squares  estimator,  in  conjunction  with  the  A. P.O.  stopping  rule.  Risk  expansions  for 
these  two  procedures,  under  the  original  model,  appear  in  Theorems  3.3  and  3.4.  In  Section 
3.4  we  compare  the  performances  of  the  procedures. 

Before  continuing,  we  should  note  one  of  the  new  features  of  our  regression  scenario. 
Observations  will  come  at  different  points  in  "design"  space,  and  will  not  be  identically 
distributed.  We  will  require  some  additional  constraints,  and  some  additional  tools  to  handle 
this  fact.  It  is  convenient  to  give  some  of  the  tools  now.  The  first  proposition  provides  for 
considerable  simplification  of  argument,  and  is,  perhaps,  not  well  known. 
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Proposition    3.1:       Suppose  we  observe  a  sequence  of  random   (kxl)  vectors,   |Yn,  n>l|, 
generated  via  the  linear  model 


Y.=X.^_+c., 


where  the  X-'s  are  (kxp)  matrices  of  rank  p,  with  k  >  p.  Also,  the  gj  are  independently 
distributed  ais  N,  (0,  a^V)  random  vectors.  Then  the  error  sum  of  squares,  SSE,  has  the 
following  representation: 


"  "  wp2 


SSE=  E  Yi-Xi/?JJ 
1=1' 


i=2 


where  (3^   is  the  weighted  least  squares  estimator  of  /?,  and  where  the  Z's  are  independently 


distributed  with  Z 


.2.2 


2.2    • 


1 


<7^Xk_pandZj  ~  a^Xk^  i  =  2,...,  n 


Proof:   See  Finster  (1983),  Section  3. 

The  second  proposition  provides  an  integrability  result  for  dealing  with  averages  of 
independent,  nonidentically  distributed  random  variables,  when  we  do  not  have  a  backward 
martingale  structure. 


Proposition   3.2:     Suppose  we  have  a  sequence  of  independent  random  variables  JYj,  i>l| 

such  that  sup  El  Y.  I     <  oo,  i/  an  integer  >  2,  then,  for  0  <  r  <  i/, 
i>l   '    " 


supn 
n>l 


n 


<  oo. 
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Proof:     First  we  show  that  it  suffices  to  consider  JYj,  i>lj  such  that  EYj  -  0,  all  i.    Let 
EY.  =  //..    Then 


sup  n 
n<m 


n 


=  E 


sup  n 
n<m 


1=1  1=1 


<  E 


sup  n  ''2^-! 
n<m 


r                 ,r^~l 

n 

n 

E  (Yr/^i) 

+ 

EA'i 

i=i 

i=l 

J_ 

<  2'^E 


sup  n 
n<m 


-r 


ECYpPi) 

1=1 


+  2    sup  n 
n<m 


n 
i=l 


<  2'" El    sup  n   ^ 
n<m 


E/YpPi) 
1=1 


+  2  sup  n 
n>l 


iS"' 


But  sup  n 
n>l 


n 


is  finite,  so  we  take  EYj  =  0,  all  i.    This  gives  us  that  Sn 


1/  _ 


n 


n  >  1,  is  a  nonnegative  L^  submartingale,  which  will  be  useful.   We  have 


E    sup  n 
n<m 


n 


r  /  t''   ^P{  max  n   ^ 
J  Vn<m 


n 


>  t   dt 


n 


>  t"  dt 


00  / 

<  r  +  r  /      t^~h\  max  n"*^ 

J  \n<m 

1  ^   " 

oo  r  m 

<  r+  r/     t-lt-  ^Y,(  +   E  k-'^E(sj;-S|;_i) 


dt, 


(applying  a  maximal  inequality;  see  Chow  and  Teicher,  1978,  Theorem  8,  p.  243), 


<  r  +  C 


nY,(  +  E^k-E(s|;-sj;_,) 


where    C    is    some    positive    constant,    independent    of   m.       It    now    suffices    to   show    that 
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Y,  k~''E(Su-Su    i)  is  bounded  above  by  some  constant,  for  all  m.   We  have 


<  c  g  k-4  E^  e(s[_j)1 

k=2        li=2     ^       ^J 


where  C  is  a  generic  constant,  and  we  have  used  a  global  bound  on  the  moments  of  |Yj^|.    At 


this  point,  since  SJ  _■.  — 


k-1 
j=l   ^ 


is  the  i      absolute  power  of  a  sum  of  independent  zero  mean 


random  variables,  we  apply  a  result  shown  in  Chung  (1951),  to  obtain 


m  /  \  m        ,M-lf  o~^k-l         i 

:=2        ^   '^      '^    ^^  k=2        i=2V  j=l    '    •"  > 


m        ,^-1  9 
k=2        i=2 


1   ''-I 

<CEk"''Ek  ^ 
k=2        i=2 

m        \    0    I 
<CE  k   ^   "^   ^ 
k=2 

<  C  <  00 


for  all  m.   The  lemma  is  thus  proved. 

The  third  proposition  gives  a  matrix  result  which  will  be  very  useful  for  working  with 
the  asymptotic  constraints  on  the  sequence  of  design  matrices  encountered. 
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Proposition    3.3:    Suppose  we  have  a  sequence  of  positive  definite  matrices,  -j  Aj,  i>l  |,  plus  a 

n 
pd  matrix  B,  and  a  symmetric  matrix  C.    If  ^  (A.-B)  ^  C  as  n— ►oo,  then 

i=l 


n((n->f;A,) 


-  B"M  -*  -B"^CB~^    asn^oo. 


Proof:    We  write 


"((»-' £*!)"'  -  B-')  =  (n-\|AO''n(B  -  „"' £  A,)?-' 


-1 


=  (n-^i:Aj)     (E(B-Aj))b-1 


-B    ^CB    ^  as  n  -^  00, 


n  \  /  _i  n        \  _i 

since  y;(A.  -B)-»Casn-*oo  implies  that  (n^X)^;)       -+Basn-+oo.    Thus, 
\^\~^       ~>         '  ^       i=l    ^' 

the  lemma  holds. 

We  now  turn  to  the  main  development  of  the  regression  model  results. 


3.2  A.P.O.  Rule  Under  a  Hierarchical  Bayes  Model 
Consider  the  following  hierarchical  Bayes  model,  an  expansion  of  the  model  of 
Section  2.2.  Suppose  that  conditional  on  /3  =  /?  =  (/?p...,  /?p)  and  R  =  r,  Yj.  Y2V  are 
independently  distributed  Nj^^Xj/?,  r~Vj,  where  Y  is  a  known  positive  definite  (p.d.) 
matrix,  and  {X.,  n  >  1}  is  a  sequence  of  kxp  matrices  of  rank  p,  with  k  >  p.  Suppose  also 
that  conditional  on  M  =  m  and  R  =  r,  /?  ~  Npfmlp,  (Ar)  D j,  where  Ip  is  a  p-component 
vector  of  I's,  A  (>0)  is  known,  and  D  is  a  known  p.d.  matrix.    It  is  assumed,  as  before,  that 

M  and  R  are  marginally  independent,  with  M  having  a  proper  pdf,  g(m),  on  (-00,  00)  such 

/oo 
m  g(m)dm    <    00,  while  R   ~   Gammafia,   Ibj.     Finally,   we  require  a  certain 

-00 
amount  of  stability  in  the  behavior  of  the  design  matrices  to  develop  asymptotics.    Here,  we 
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suppose  that  ^  X^'y^X-  -  nE    ^  ^  H  as  n-+oo,  for  some  pair  of  matrices,  E  and  H,  witli 

i=l 
E  positive  definite.    An  example  of  a  situation  where  this  condition  is  satisfied  is  that  where 

the  design  matrices  are  constant  from  some  fixed  point  on.    Then  we  have  ]C  ?i  Y     ^i  = 

i=l 

^  X?^V"-^X.  +       Y.      x'^y-^X.    We  would  choose  E~^  equal  to  X    V      "^  ^n^  S  ^^^^^  ^° 


i=l 


i=n(j+l 


0 
Y,  (x7v~^X-  -  E~^)-    In  this  case  the  stability  condition  gives  a  measure  of  how  far  the 

observations  are  from  being  identically  distributed. 

We  again  need  the  posterior  distribution  of  the  parameters  to  determine  a  stopping 
rule.    From  our  discussion,  the  joint  pdf  of  Yp--,  Yn-  0>  M,  and  R  is  given  by 

f(yiv,  yn.  ^.  m,  r) 


ink 
oc  r       exp 


i  i:(yi-Xi/?)TY-i(yi-Xi^) 


P      r 
X  r^exp 


-^(^-mlp)    p-l(^-mlp) 


2-1 
X  r^      exp 


[-f]g(m). 


(3.2.1) 


Expanding  the  quadratics  in  (3.2.1),  and  using  techniques  like  those  of  Chapter  2,  we  obtain 


f(yiv,  yni  P,  m.  r) 


2P     r. 
(X  I     exp 


1  1 

-l[p  -(Ap-^Mn)"  (Amp-hp+Pn)} 


X  (Ap-^Mn){^  -  (Ap-l+Mn)     (AmD    hp+Pn)} 


X  r 


i(nk+b)-l 


^^p[i(Snl  +  Sn2("^)  +  ^flg^'") 


(3.2.2) 
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where 


n      T       1 

i=l 


Pn=  Ex^y-^yp 

i=l 


and 


Snl-£fc-^i^n)    Y~%-h^^^y 


Sn2(^)  =  (^n  -  '"IpX^"^?  +  Mn^)     (^^  -  mlp) 

with  /?n  =  Mn'^Pn.  the  weighted  least  squares  and  maximum  likelihood  estimate  of  ^.  We 
will  abuse  notation,  and  not  distinguish  between  Pn,  S^^p  and  8^^2(1")  as  realizations,  and 
the  corresponding  random  variables.  It  follows  from  (3.2.2)  that  conditional  on  Y-^  =  Iv---' 
Yn  =  yni  M  =  m,  and  R  =  r, 

13  ~  Np((Ap-l+Mn)"\Amp-hp+Pn),  r-HAp-l+Mn)"^)-  (3-2.3) 

Also,  conditional  on  Y^  =  yp...,  Yn  =  Xni  M  =  m, 

R  ~  Gamma  U{s^i  +  S^^i"^)  +  a),  i(nk+b)  J  (3.2.4) 

and  the  joint  marginal  of  Y  j,. . .,  Yn,  and  M  is  given  by 

-|(nk+b) 
f(yi,...,  yn,  m)  a  [S^^  +  Sj^2M  +  ^]  g("^)-  ^^-^'^^ 

The  posterior  mean  of  p  is  thus 
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^n  =  E(^|An)  =  (AD"^  +  Mn)    ^(AE(M|An)p    hp  +  Pn), 


and 


^0  =  E(^|Aq)  =  E(M)lp, 


where  An  is  the  cr-algebra  generated  by  Yp-i  Yn  and  Ag  is  taken  as  the  trivial  cr-algebra. 
If  we  estimate  /?  by  ^n-  and  b  >  2,  we  obtain  the  following  posterior  estimation  risk,  via  the 
same  calculations  as  those  of  Chapter  2: 


\/3-ln\   An    =  tr(Ap-l+Mn)     e[r   ^|aJ 


1  HP 

+  tr{(Ap-l+Mn)~  (Ap-hp)(Ap-hp)    (Ap-^+Mn)}  V(M|An) 


=  n-ltr(Fn)E[R-l|An]  +  A2n-2tr(FnGFn)V(M|An), 


(3.2.6) 


,-It^-I   .  .-1 


li     iTr,-l 


where  Fn  =  (An~^P~^  +  n~^Mn)      and  G  =  P    MplpP    ^    Note  that 


1  S„l+S„9(M)  +  a 

E(R-l|An,M)=    "^,,;t-2 


(3.2.7) 


We  can  write  (3.2.6)  as 


l^-^nf |An    =  n-ltr(Fn)|fs„j  +  E(s„2(M)|An)  +  aj  /(nk  +  b  -  2) 


+  n-^A^tr(FnGFn)V[M|An] 


=  n   ^Un  +  n    ^Rn, 


(3.2.8) 
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where 

Un  =  tr(E)S^i/(nk  +  b  -  2); 

Rn  =  tr(Fn  -  ?)S„i/(nk  +  b  -  2) 

+  n-^tr(Fn)|(E[S^2(^^)l^n]  +  a)/(nk  +  b  -  2)1 

+  n-^A^tKFnGFn)  V[MlAn]  (3-2.9) 

and  E  is  a  p.d.  matrix.    We  want  Un  to  be  the  dominant  element  of  (3.2.8),  as  in  Chapter  2. 

There,  however,  fn  automatically  converged  to  E.     Here,  the  imposed  stability  constraint, 

f^X.V~^X.-nE~^  -^  H,  implies  that  n~^  X^  XjY"^Xj  =  n~%n  -^  ?""^  and  hence  that 
i=r''       ■'      ■  '  i=l 

Fji— >E,  as  n— ^oo.    With  this  implication,  we  can  show  that  Un  is  the  dominant  term,  much 

as  in  Chapter  2.    First  note  that  conditional  on  R  =  r  and  ^  =  ;0,  S^^^/nk  is  the  m.l.e.  of 

r"^       It    follows    that    Un  -^  (trE)R~^    a.s.    as    n-*oo.       Also,    V(M|An)    is    a   positive 

supermartingale,     and     thus    almost    surely     convergent.         Finally,     since     E||Sjj2(M)J    = 

e{r-1e[RSj^2(^)I^'  ^]}  =  e[r-1e(x?)]  =  g,  it  follows  that  E[Sj^2(^)l^n]  ''  '^P(^)- 
Combining  these  facts  we  have  that  Un  ^(trE)R"^  and  Rn  ^  0  as  n-^oo.  It  follows  that 
the  stopping  rule 

T  =  Tc  =  inf{n  >  n^:    Un  <  cn^}  (3.2.10) 

is  A.P.O.  in  the  sense  of  Bickel  and  Yahav  (1967).  Note  that  this  is  the  "same"  rule  as  that 
develop)ed  in  Chapter  2,  and,  again  we  can  take  Uq  =  1,  for  the  moment. 

Remark  3.1.  As  in  Chapter  2,  our  A.P.O.  rule  is  not  dependent  on  the  prior  distribution  of 
M. 
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Before  establishing  a  risk  expansion,  and  proving  the  asymptotic  nondeficiency  of  the 
proposed  A. P.O.  rule,  we  state  a  lemma  containing  the  first  order  properties  of  the  procedure. 


Lemma   3.1:    Suppose  b  >  2.   Then 

(i)         Urp  -^  tr(E)R"^  a.s.  as  c  -^  0; 
(ii)       E(Urp)  ^  tr(K)E(R"^)  as  c  -+  0; 
(iii)      c^T  ^  (trE)^R  ^  a.s.  as  c  -.  0; 
(iv)      E(c2T)  -^  (trE)2E(R   ^ J  as  c  ^  0; 
(v)        V(M|Arp)  -^  V(M|^,  R)  a.s.  as  c  ^  0. 


Proof:    First  we  prove  (i).    It  follows  from  the  definition  of  the  stopping  rule  that  T— »-oo  a.s. 

as  C-+0.    In  conjunction  with  the  previously  noted  feict  that  Un  — ^  (tr?)R       a.s.  as  n— ►oo, 

n 
the  result  follows.    To  establish  (ii),  use  Proposition  3.1  to  write  S^^^  =   J2  Zj,  where  the  Z's 

i=l 

are  independent  with  Z^  ~  r~^Xk_p  and  Zj  ~  r~^Xk'  i  >  2,  given  ^  =  ^  and  R  =  r.   Then 


sup  (S  ,/nk  +  b  -  2) 


n>n(j 


<  E(Z^)  +  KE 


<  E(Zj^)  +  KE 


sup  (Jz./n-l) 


(|/iAo-0 


using  Doob's  maximal  inequality  applied  to  the  backward  martingale,  |£  Zj /(n-l)|,  and 
where  k  is  some  constant.  Dominated  convergence  then  gives  (ii).  Arguments,  as  in  Lemmas 
2.1  and  2.2  suffice  to  establish  (iii)  -  (v). 

We  now  develop  the  second  order  properties  of  our  A. P.O.  procedure.    We  start  with 
a  Bayes  risk  expansion.    The  result  should  be  compared  to  that  of  Theorem  2.1. 
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Theorem  3.1:    If  (nf,-l)k  >  9,  b  >  2,  E(m'')  <  co,  and  Mn-nE    ^  ^  H  as  n^oo,  then 


(3~'0r^\    H-cT 


1  1 


^2c\trV)Hr{^)/T(^m 


+  cl(2k)    ^  -  tr(EHK)/tr(i;)  -  Atr(Ep-^E)/tr(i:) 


+ 


A2{tr(EGE)/tr(?)}  E[RV(M|^,  R)] 


+  o(c)  as  c— +  0. 


(3.2.11) 


Proof:    Let  Wn  =  E(R"  Vn)-   Then,  exactly  as  in  (2.2.28), 


l^-^^f  +  cT 


1  1        _i 


1  1 

2/-*.v\2 


=  E^2c^(trE)2E(R  2|A^)  +  2c'(trE) 


W^  -  E(R  ^|Arp) 


1     1         1 


+  T-^  (trE)2w|,  -  c^tJ    +  T-^[tr(FT)  -  tr(E)]WT 


+  T"2A2tr(F^GFrj,)  V(M|A^)  \. 


(3.2.12) 


As  in  Theorem  2.1,  we  establish  Theorem  3.1  by  showing  that  as  c  -+  0, 


E 


1  1        _i 

2c^(trE)^E(R   ^|A^) 


1  1 


=  2c^tr?)^  r(ti)/r(b)  (1)^ 


(3.2.13) 


1  1,1 


2c  "2(trE)^(w|,  -  E(R  ^|A^))    =  (2k)     ; 


(3.2.14) 


Erc-lT"l(tr(F^)  -  tr(E))w^ 
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-tr(EH?)/tr(E)  -  Atr(Lp    ^?)/tr(E); 


(3.2.15) 


E[c~^T"2A2tr(F^GF-p)V(M|A^)] 


A2{tr(EGE)/tr(E)}E[RV(M|^,  R)]; 


(3.2.16) 


11         1      2' 
c"^T-^((tr?)'W|,  -  c'Tj 


(3.2.17) 


This  is  done  in  Appendix  E. 


Remark  3.2.  The  chief  technical  difference  between  the  vector  of  means  model  and  the 
regression  model  is  that  the  second  provides  independent,  but  not  identically  distributed 
observations.  This  has  a  direct  impact  on  the  proofs  of  (3.2.13)  through  (3.2.17). 
Nonetheless,  Remark  2.1  is  appropriate  here,  too.  That  is,  other  stopping  rules,  both  prior 
independent  and  prior  dependent,  exhibit  the  same  risk  expansion.  The  chief  structural 
difference  is  the  presence  of  a  "design  effect,"  -tr(EHE)/tr(E),  among  the  second  order 
terms. 


E 


We   shall    now    prove    that    if   N    =    Nc    denotes   the    Bayes   stopping    rule,    then 

9  ~1 

,^_^    j    -I- cN    has  the  same  expression  as  given  in  the  r.h.s.  of  (3.2.11). 


Theorem    3.2:    If  (nn-l)k  >  9,  b  >  2,  E(M'^)  <  <x),  and  Mn-n?    ^  -  H  as  n^oo,  then  for 


the  Bayes  stopping  rule  N, 
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1  ,       ,1 


P  -  l^f  +  cnJ  =  2c5(trE)^{r(l(b-l))/r(b/2)}(a/2)^ 


+  c 


(2k)       -  tr(EHE)tr(E)  -  Atr(Ep    ^EytrC?) 


+ 


A2|tr(?Gs)/tr(E)}E[RV(M|e,R)] 


+  o(c)  as  c— ►0, 


(3.2.18) 


where  G  =  P"hpljp"^ 


Proof:  The  decomposition  of  Theorem  3.1,  equation  (3.2.12)  is  valid  here,  with  "N"  replacing 
"T".  It  follows  that  to  establish  this  theorem,  it  suffices  to  prove  a  set  of  asymptotic 
relationships  as  in  (3.2.13)  -  (3.2.17): 


1  1,1 


1  1 


2c5(tr?)^E(R""^|AN)    =  2c5(tr?)^    r(l(b-l))/r(ib)   (|)^ 


(3.2.19) 


1/     1 

2/  xxr2 

N 


2c   "(tr?)''   Wi,  -  E 


R"^|An 


-  (2k)   \ 


(3.2.20) 


E[c-%-\trFj^  -  trE)Wj^]  -  -tr(EHi;)/tr(E)  -  Atr(Ep-lE)/tr(E),  (3.2.21) 

E[c-lN-2A2tr(Fj^GFj^)V(M|Aj^)]  -  A2|tr(EGE)/tr(E)}E[RV(M|e,  R)],  (3.2.22) 


E 


1     1 


1    ^2 
-%-^((trE)%^  -  c^n) 


0,  as  c  —  0. 


(3.2.23) 


Here,    as    in    Theorem    3.1,    we    need    the    behavior    of    N    to    even    get    started.       Write 

"  .2  " 

.     Following  Bickel  and  Yahav  (1967),  it  can  be  shown  that  for 


Ln(c)  =  E 


Mn\    + 


en 


any  stopping  rule  r  =  re,  L^(c)  ~  infLn(c)  if  and  only  if  r^  ~  Tc  a.s.  as  c  — ►  0.    N  being 
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the  Bayes  rule,  we  must  have  Nc  ~  Tc  a.s.  as  c  ->  0.  Then,  using  Lennma  3.1, 
cN^  -^  (trE)R"-'  a.s.  as  c  -+  0  (Nc  -<•  oo  a.s.  as  c  ^  0).  This  behavior  for  N  is  sufficient  to 
establish  the  appropriate  pointwise  convergences  associated  with  the  integrands  of  (3.2.19)  - 
(3.2.23).  (See  the  proof  of  Theorem  3.1.)  We  will  find  that  (3.2.23)  is  forced,  given  the  other 
relationships,  and  the  fact  that  N  is  Bayes.  It  remains  to  establish  uniform  integrabilities. 
The  details  are  provided  in  Appendix  F. 

3.3     A. P.O.    Rule    Under    an    Improper    Prior 

Suppose  we  have  the  model  of  Section  3.2,  with  the  following  exception:  g(m)  =  1 
on  (-00,  oc.).  The  resulting  prior  distribution  on  /?  will  be  improper,  but  the  posterior 
distribution  of  /?  given  Y.  =  yj  (1  <  i  <  n)  will  be  proper  for  every  n  >  1.  It  follows  that 
the  Bayes  risk  of  any  procedure  under  this  model  is  infinite.  We  shall  proceed  formally,  to 
develop  our  estimator  and  stopping  rule,  and  then  evaluate  their  performance  in  the  context 
of  the  model  of  Section  3.2. 

Once  again,  in  order  to  motivate  the  A.P.O.  we  must  first  find  the  formal  posterior 
distribution  of  /3  given  Y-  =  y;  (1  <  i  <  ")•  Note  that  the  joint  pdf  of  Yp...,  Yn,  P,  M, 
and  R  is  given  by 

f(yiv,  yn.  P^  m,  r) 

iE(yrXi^rY-i(ypXj^)] 

~  T  ~l    --1  '^ 

-^(^-mlp)    p-l(^-mlp)J  r^      e    ^.  (3.3.1) 


ink 
oc  r       exp 


X  r^exp 


b_i  -IS 

I 


Writing  C  ==  P"^  -  (lpP~hp)     P    hplpP    ^  it  follows  that 
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,T 


(^-mlp)    p-l(^-mlp) 


-f 
=  (lpP-hp)[m  -  (lJp-hp)"STp-l^J    +  ^Tg^  (332) 


Integrating  the  joint  pdf  in  (3.3.1)  with  respect  to  M,  using  (3.3.2),  we  find 


f(yiv,  yn>  P,  r) 


nk       r 
oc  r  ■^  exp 


iE(yi-Xi^)'^Y-i(yi-Xi^) 
1=1 


X  r 


p-i 

2 


exp[-^^'^C^]  r^ 


b    1  -f 
e    ^. 


(3.3.3) 


As  before,  let 


n      T       1 
Mn=  EX;  Y"%, 
i=l 


Pn=  Ex^'y-Vi- 
i=l 


(3.3.4) 


Then,  it  follows  after  some  algebra  that 


E  (yi-?i^)'^Y"Hyi-Xi^)  +  A^Tg^ 


T 

=  {§-  (Mn+AC)~Vn)    (Mn+AC)(^  -  (Mn+AC)~  Pn) 


+  Ey?'y"Vi-Pn(Mn+AC)    \n 
i=l 
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I  1 

(^  -  (Mn+AC)"Vn)    (Mn+AC)(^  -  (Mn+^C)"  Pn) 


+  ECyrXiMn^Pn^Y'^Cyi-XiMnlPn) 
i=l 


+  Pn  Mn^S(s'^Mn%  +  I)S^Mn^Pn, 


(3.3.5) 


where  we  have  written  AC  as  SS'^  and  expanded  (Mn  +  AC)     ,  since  C  is  n.n.d.  with  ranlc 
k  -  1.    Now  write 

S„i=  i:(yi-XjMnlPn)'^Y-l(yi-XjMnlpn)=  i:{yrXi^-)'^Y-l(yr^i^n) 
i=l  *'  ■  1=1 

S*2  =  P?Mn^S(s'^Mn^S  +  I)S'^Mn^Pn 

and,   again,   we   will   allow  context   to  distinguish  random   variables   and   their   realizations. 
From  (3.3.3)  -  (3.3.5)  we  get 

f(yiv,  yn.  i^  0 


ink 
a  r       exp 


T  1 

-l{p  -  (Mn+AC)'Vn)    (Mn  +  AC)(^  -  (Mn  +  AC)"  Pn)_ 


X  r  exp 


i(Snl  +  ^^2  +  4 


(3.3.6) 


Formula  (3.3.6)  leads  to  two  important  conclusions.  First,  conditional  on  Yj  -  yj 
(i  =  1,...,  n),  and  R  =  r,  ^  ~  Np((Mn+AC)~  ?„,  r^^Mn  +  AC)  ).  Let  ^n  = 
(Mn+AC)~  Pn  and  observe  that  it  does  not  depend  on  r.  Second,  from  the  joint  marginal 
pdf  of  Y,,...,  Yn  and  R-.  we  can  obtain  that  conditional  on  Yi  =  yi  ('  =  ^'■••'  ")'  ^  ~ 
Gamma(l(S^l  +  S*2  +  a),  i(nk  +  b  -  1)).     Thus,  formally,  the  posterior  risk,  using  ^n,  is 
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^nfAn    =  tr(Mn+AC)~  E[R-l|An] 
=  {tr(Mn+AC)~^}(Sj^l  +  8*5  +  a)/(nk  +  b  -  3) 
-  tr(?){Sj^j/(nk  +  b  -  3)} 

+  tr{(Mn+AC)"^  -  ?}(S^i)/(nk  +  b  -  3) 

+  {tr(Mn  +  AC)"^}(S*2  +  a)/(nk  +  b  -  3).  (3.3.7) 


The  next  step  is  to  establish  a  stopping  rule  based  on  a  dominant  term  from  (3.3.7).  To 
obtain  such  a  dominant  term  we  need  a  valid  probability  model,  and  we  assume  the 
observations  Y.,  i  >  1,  come  from  the  model  of  Section  3.2.  Essentially  the  same  argument 
as  that  of  Section  2.3  for  6  establishes  that  (trE)|Sjjj/(nk  +  b  -  3)|  is  a  dominant  term. 
This  is  almost  Un  from  equation  (3.2.8),  and  so  we  propose  the  same  stopping  rule  T  as  given 
in  (3.2.10).  Thus,  our  "noninformative"  prior  procedure  is  to  stop  at  time  T,  and  then  to 
estimate  /?  by  ^rp. 

We    now    develop    the    asymptotic    Bayes    risk    expansion    for    our    noninformative 
procedure,  say  (T,  ^rp),  under  the  model  of  Section  3.2. 

Theorem  3.3:  Consider  the  model  proposed  in  Section  3.2.  If  (nQ-l)k  >  9,  b  >  2, 
E(M^)  <  00,  and  Mn  -  nE~     — ►  H  as  n— ►oo,  then 


II' 


.ill 
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^-^T^f  +  cT 


1  1 


=  2c^(t.?)^  r(l(b-i))/r(ib)  (I) 


+  c[(2k)    "*  -  tr(EHS)/tr(E)  -  Atr(EgE)/tr(E)J 


+  o(c)  as  c  — >  0. 


(3.3.8) 


Proof:    Using  standard  Bayesian  arguments, 


(|^^Tf 


+  cT 


=  E 


/?-/?T      +«^T 


+  E 


'^-§T 


(3.3.9) 


In  view  of  Theorem  3.1,  it  suffices  to  show  that 


Efc-^l^-^S^f    -^  Atr{?(p-^-C)?}/tr(E) 


-  A2|tr(EGE)/tr(§)}E[RV(M|^,  R)].  (3.3.10) 

This  is  done  in  Appendix  G. 

We  again  evaluate  the  performance  of  the  usual  frequentist  estimator,  0^^ ,  in  this 
sequential  context.    We  have,  for  the  procedure  (T,  l3lj),  the  following  result. 


Theorem  3.4:       Again,    consider    the    proper    regression    model.       If   (nQ-l)k  >    9,    b    >    2, 
E(M   )  <  oo,  and  Mn  -  nT,~     -+  H  as  n— voo,  then 
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\tm  +  '^ 


1    1 


=  2c^(trE)^  r(l(b-i))/r( 


f  +  c[(2k)    ^  -  tr(?HE)/tr(E)J 


+  o(c)  as  c  — »  0. 


(3.3.11) 


Proof:    Standard  Bayesian  calculation  leads  to 


\tm  +  '^ 


p-p 


^J^ 


+  cT 


+  E 


P-P^ 


(3.3.12) 


In  view  of  Theorem  3.1,  it  suffices  to  show  that 


E 


r^j^^^r 


Atr(?P"^E)/tr(E) 


A2|tr(EG?)/tr(E)]>E[RV(M|^,  R)]  as  c  -*  0.  (3.3.13) 


This  is  done  in  Appendix  H. 


Remark  3.3.  If  we  specialize  our  results  to  the  case  where  E  =  D  =  I,  H  =  0  and  M  has  a 
degenerate  distribution,  then  it  is  possible  to  make  a  comparison  with  the  results  of  Finster 
(1987).  He  uses  a  different  loss  structure,  but  because  the  posterior  estimation  risk  is  similar 
to  ours,  the  overall  risk  expansions  show  comparable  terms.  In  our  case  we  obtain,  for 
(T,  prj),  that  wjc^  +  u^c  =  2c2p2E(R  ^)  +  c^(2k)  -  Aj.  This  agrees  with  Fmster's 
expansion,  upon  equating  our  R"\  p,  k  and  A  with  his  a^,  k,  m  and  Tq.  By  constraining  M 
to  be  degenerate,  we  return  to  a  conjugate  prior  set-up,  which  is  what  Finster  uses. 
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3.4     Summary  and  Comparison 
It  is  useful  at  this  point  to  collect  our  results  in  order  to  make  comparisons.    We  have 
discussed  three  procedures:    (T,  ^j),  (T,  Prj.)  and  (T,  ^^).    As  in  Chapter  2,  they  can  be 
viewed  as  methods  based  on  decreasing  awareness  of,  or  confidence  in  the  "true"   model. 
Their  asymptotic  performances  take  the  following  forms: 


13-13  j\    +cT 


1  r- 


=  UifC     +  C 


(2k)    ^  -  tr(EH?)/tr(i:)  -  Atr(i;p-^?)/tr(E) 


+ 


A2jtr(EG?)/tr(?)}E[RV(M|^,  R)] 


+  o(c) 


1         - 
w^c^  +  c 


(2k)    ^  -  tr(?HE)/tr(E)  -  Atr(Ep-^i;)/tr(i:) 


+ 


A|tr(i:(p-^-C)£)/tr(S)}E[v(M|^,  R)/V*] 


+  o(c),  (3.4.1) 


where  V 


(ipP    hp)''^R  I      =  V(M|^,  R)  under  a  uniform  (-co,  oo)  prior, 


(I 


E    ^-^rp        +cT 


=  u-^c^  +  c[(2k)    ^  -  tr(EHE)/tr(S)  -  Atr(i:CE)/tr(E)J  +  o(c). 


(3.4.2) 


I^-^tI    +  ctJ  =  w^c^  +  c(2k)    Wo(c). 


(3.4.3) 


A  comparison  of  the  second  expression  for  the  risk  of  (T,  ^rp)  and  the  expression  for  the  risk 
of  (T,  ^rp)  shows  the  effect  of  the  lost  information  concerning  M.    If  g(m)  expresses  a  precise 
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knowledge  of  M,  then  V(M|^,  R)/V*  will  tend  to  be  close  to  zero.  If  g(m)  expresses  vague 
knowledge  of  M,  then  V(M|^,  R)/V*  will  tend  to  be  close  to  one,  in  which  case  the  two  risk 
expressions  will  coincide.  A  comparison  of  the  asymptotic  risks  of  either  (T,  ^rj,)  or  (T,  (3n^) 
and  (T,  (3^)  brings  out  the  role  of  A  in  the  model,  just  as  in  Chapter  2.  Large  values  of  A, 
indicating  relatively  more  knowledge  of  the  variation  in  f3  about  Mlp,  produce  greater  risk 
reduction  by  (T,  ^^)  or  (T,  ^^)  over  (T,  ^^).    Recall  that 

1^  =  (Ap"^M^)~'^(AE(M|A^)p-hp  +  P^), 

which  goes  to  M^^^Pj  =  ^^  as  A  goes  to  zero.    Similarly  ^rj.  goes  to  ^^  as  A  ^  0. 

It  is  also  useful  to  look  at  the  differences  generated  by  going  from  the  vector  of  means 
case  to  the  regression  parameters  case.    Comparing  the  statements  of  Theorems  2.1  through 
2.4  with  those  of  Theorems  3.1  through  3.4,  we  find  only  minor  structural  differences.    There 
is  the  dimension  reduction  efTect  of  going  to  the  regression  model,  which  changes  the  c(2p) 
term  to  c(2k)~  .    The  greater  the  dimensional  reduction  provided  by  l3,  the  relatively  smaller 
the  risk  contribution.    Note  that  the  asymptotic  nature  of  the  term  in  both  models  is  second 
order.    The  other  difference  is  in  the  appearance  of  a  "design"  term,  -ctr(EHE)/tr(S).    If  the 
design  matrices,  X;  are  all  the  same,  and  we  write  Xj^Y'^Xj  =  S'^  then  H  must  be  null, 
and  the  extra  term  in  the  risk  expansion  disappears.    When  present,  the  design  term  has  a 
negative  coefficient,  because  of  the  matrix  calculus  involved.    The  term  itself  is  not  always 
negative,  however.     To  get  a  better  idea  of  what  is  going  on,  we  note  that  a  sufficient 
condition  for  -tr(SHi;)/tr(E)  to  be  negative  is  for  H  to  be  positive  definite.    In  this  case  the 
deviations  of  the  XjY"^Xj's  from  E"^  are  points,  on  the  whole,  "farther  out"  in  the  design 
space.      These  points  contribute   to  more  accurate  estimation   of  l3.      The  situation   of  H 
negative  definite  produces  the  opposite  effect.   Again,  the  contribution  is  second  order. 


CHAPTER   4 
SUMMARY  AND   FUTURE   RESEARCH 


4.1     Summary 

In  Chapter  2  we  considered  sequential  estimation  of  a  vector  of  normal  means,  6, 

under  a  hierarchical  Bayes  model  with  loss  given  by  sum  of  errors  squared  plus  (linear)  cost. 

We   determined    an    A.P.O.    stopping    rule    and    showed    that    the   corresponding   sequential 

procedure  was  asymptotically  nondeficient  with  respect  to  the  Bayes  sequential  estimation 

1 
procedure,  having  the  same  Bayes  risk  expansion:    w^c^  +  ui^c  +  o(c).    We  then  proceeded  to 

look  at  the  performance  of  estimators  other  than  the  posterior  mean  in  conjunction  with  our 
A.P.O.  rule:  a  diffuse  prior  based  estimator  and  the  traditional  sample  mean  vector.  We 
found  risk  expansions  for  these  procedures,  and  identified  their  performance  losses,  which 
occurred  in  the  order  c  terms. 

In  Chapter  3  we  considered  sequential  estimation  of  a  vector  of  regression  parameters, 
/?,  from  a  generalized  linear  model  with  normal  error  structure,  under  a  hierarchical  Bayes 
model.  In  an  analysis  parallel  to  that  of  Chapter  2,  we  developed  an  A.P.O.  stopping  rule 
and  evaluated  the  performance  of  estimation  procedures  based  on  the  A.P.O.  rule  m 
conjunction  with  the  true  posterior  mean  of  ^,  a  posterior  mean  arrived  at  via  diffuse  prior 
calculations,  and  the  traditional  weighted  least  squares  estimator.  Again,  the  basic  procedure 
was  shown  nondeficient,  and  the  remaining  procedures  showed  deficiencies  in  the  order  c 
terms  of  their  risk  expansions,  with  the  least  squares  estimator  showing  the  greatest 
deficiency.  We  also  indicated  the  practical  differences  produced  by  extending  to  a  regression 
model. 
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4.2     Further    Research 

There  are  many  directions  in  which  to  pursue  further  research,  staying  in  the 
framework  of  A. P.O.  rules  for  sequential  multiparameter  estimation.  Three  quickly  identified 
'4"ronts''  are  the  model,  the  loss  structure  and  the  stopping  rule.  In  the  context  of  the  model 
we  can  consider  adding  a  general  hyperprior  component  to  the  gamma  parameters  associated 
with  the  prior  distribution  of  R.  We  can  also  consider  putting  a  prior  distribution  on  A.  In 
the  context  of  loss  structure,  we  can  look  at  various  extensions,  perhaps  to  the  generality  of 
Bickel  and  Yahav  (1968),  who  looked  at  general  smooth  losses,  having  specified  behavior  near 
zero  and  infinity.  In  the  context  of  stopping  rule,  we  have  noted  certain  simple  modifications 
of  the  A. P.O.  rules  developed,  which  also  prove  nondeficient.  Specifying  a  class  of 
nondeficient  stopping  rules  could  be  quite  interesting. 

Finally,  outside  the  realm  of  A. P.O.  rules,  it  would  be  very  practical  to  consider 
multiparameter  estimation  under  hierarchical  models  when  sampling  is  done  in  two  or  three 
stages.  Full  sequential  sampling  is  not  always  even  possible,  much  less  practical.  An 
asymptotic  approeich  could  allow  the  sizes  of  the  stagewise  samples  to  increase  when  cost 
decreases,  as  for  example  in  Hall  (1981). 


APPENDIX   A 
DETAILS   OF   THEOREM   2.1 


Here  we  establish  the  details  of  Theorem  2.1.    For  convenience,  we  rewrite  equations 
(2.2.29)  -  (2.2.33).    First  we  prove  (2.2.29). 


Equation    (2.2.29):    For  all  c, 


1  1      ,    _i 


1  1 


2c5(trE)^  e(r"  Vt)    =  2c^(trE)^    r(i(b-l))/r(lb)    (|) 


Proof:      Since    <  e(r  ^|An),    n  >  0  >   is   a   uniformly   integrable   martingale,    the   optional 

stopping  theorem  applies  to  the  l.h.s.  of  (2.2.29).     The  result  follows  upon  computing  the 

_i 
prior  mean  of  R  ^.    We  now  consider  (2.2.32). 

Equation   2.2.32:    As  c  -+  0, 


rc-^T-2A2tr(F^GF^)v(M|A^)     -  A2{tr(EGs)/tri;}E[RV(M|0,  R)]. 


Proof:    From  Lemmas  2.1  and  2.2,  and  the  fact  that  T->oo  a.s.  as  c-+0,  we  obtain 


T~2A2tr(F^GF^)v(M|A^) 


A^J  trfEGEW?  Uv(M|0,  R)  a.s.  as  c^O. 


(A.l) 
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The  result  will  follow  by  showing  that  the  l.h.s.  of  (A.l)  is  uniformly  integrable.    We  exhibit 
an  integrable  dominating  function.    Note  that,  using  (2.2.17),  (2.2.20),  and  Lemma  1.2, 


c"^T"2A2tr(FjGFrj,)V(M|A^) 


<  A2u;^^tr(EGi;)  E(m2|A^) 


<  A^ftrEGE/tri;)  sup  E(M2|An)(s   i/(n-l)p)     . 


'^n>n 


(A.2) 


— z 
Note    that    the    sequence    •|(S   i/(n-l)pj      ,    n>nQ^    is    a    backward    submartingale,    since 

S   2/(n-l)  is  a  mean  of  exchangeables,  and  that  ^E  (M   |An),  nQ<n<oo|-isa  submartingale, 

since  it  is  closed  by  M     (Chow  and  Teicher,  1978  Thm.  2,  pg  234).    Using  these  facts,  the 

Schwarz  inequality,  and  Doob's  maximal  inequality  one  gets 


sup  E(M2|An)(s^i/(n-l)p) 
n>nn  ^  °^  ' 


>-"0 


<  E 


sup  E2(M2|An) 
n>no 


E 


1"P  (Sn„l/("-l)p) 


n>nn^     0 


<  K  E^  [  e2(m2|Aoo)]e5     (Sn„l/("0-^)p) 


-r  n 

=  KE^Le2(m2|Aoo)J  E 


R^E 


Knl/(no-l)p) 


-2 


R,  0 


<  K  E^(M^)E^(r2)E^(x2^^_i)p)^^  <  cx), 


(A.3) 


since  (nQ-l)p  >  5  and  ER     <  oo.    In  the  above,  and  in  what  follows,  K  is  a  positive  generic 
constant  which  may  depend  on  Uq  and  p,  and  need  not  be  the  same  at  different  steps.    Thus, 
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the  l.h.s.  of  (2.2.34)  is  uniformly  integrabie,  and  so  from  (A.l)  and  (A. 3)  we  obtain  (2.2.32). 
We  next  prove  (2.2.31). 

Equation    2.2.31:    As  c  -+  0, 


c"^T~VtrFrp  -  trEWrp 


-A  tr  ED"'l]    /tri:. 


Proof:    Note  that  |Wn,  0<n<ooMsa  martingale,  with  Wqo  =  R-     >  since  R       is  Aqo 


measurable.    Also,  F^  -  E  =  -T    ^AE^D+AT    ^e]      E.    It  follows 


that 


c"^T"l(tr(F^  -  E))w.j,  =  -c~^T"2tr(AE(P+AT"^E)     E)Wrp 


-A|tr(Ep    ^E)/trEJ  a.s.  as  c^O, 


(A.4) 


via  Lemma  2.1,  the  martingale  convergence  theorem,  and  the  fact  that  T—kx  a.s.  as  c— ►0. 
It  remains  to  show  uniform  integrability.    We  have,  using  (2.2.20), 


-1 


c~^T~2tr(AE(P  +  AT~^E)     eW-j 


<  AU;^^tr(Ep~^E)Wrj, 


<  Atr(ED~^E)  sup  fUn^Wn) 


n>nr 


"^"0 


<  A{tr(Ep-lE)/trE}  sup     1  +  [E(Sjj2W|An)  +  a]  (S^^j) 


-1 


(A.S) 


where         the  last  inequality         follows         from  the         fact  that  Wjj  = 

(^nl  +  ErSjj2('^)|An1  +  aV(np+b-2).    It  now  suffices  to  show  that  sup  ErSjj2('^)lAn]S~J 
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is  integrable  since  the  argument  will  imply  E 


T 


sup  S^} 


n>n(j 


<  oo.    Note  that 


1/- 


E[S^2(^)l^n]  <  E   (Xn  -  Mlp)    (n"^?  +  AD    ^)     (Xn  -  Mlp)|An 


<  KE 


_(Xn-Mlp)'^?-HXn-Mlp)|An] 


<KhTE-^Xn  +  E(M2|An) 


(A.6) 


Now,  using  the  martingale  structure  of  ■|e(M  |An),  nQ<n<ooj-  the  bcickward  submartingale 
structures  of  |rS  ,/(n-l)p,  n>nQ|  and  jXn  (R?~  )Xn,  n>nQ|,  the  Schwarz  inequality, 
and  the  maximal  inequality  for  submartingales,  we  can  obtain 


"  sup  E[S„2(M)|An]S-;' 
n>no 

<  K(no-l)p) 


X  E 


sup  |(xJ(R?-l)Xn)(RS„i/(n-l)p)      +  E[M2|An](s^i/(n-l)p) 


<  K^E' 


sup  (xJ(RE-l)X„) 


n>n 


1 
E^ 


sup  (rS   ,/(n-l)p) 


n>no 


+  E 


<  kJe^ 


_,     1 


sup  E(M^|An) 
n>no 


E' 


r2  sup  (RSjji/(n-l)p)" 


n>nQ 


(xT  (R?-l)XnJ      E^  (RS     l/(no-l)p) 


+  e5[e(m4|Aoo)]  E^  r2(rS     ,/(no-l)p)" 
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=  K-^E 


(xT  (RE-l)Xn  J^l  E^  E  (RS     ,/(no-l)p)"' 


e,  R 


1         1 
+  e^Cm'*)  e^ 


R^E 


(RS     i/(no-l)p)"' 


e,  R 


<  K^E^ 


(xT  (R?-l)XnJ      E^  (xf       i)p) 


1  1  1 

+  E^(M'')  e^(r2)e2 


(^(nO-l)p) 


(xT(R?-^)XnJ 


(A.7) 


is  finite.    We 


Since  (nQ-l)p  >  5,  ER    <  oo,  it  remains  to  show  tiiat  E 

have,  using  Theorem  1,  p.  55  of  Searle  (1971),  concerning  cumulants  of  quadratic  forms  for 

normal  random  vectors. 


(xT(RE-l)XnJ 


nQ^E<^E 


(xT(noRE-l)XnJ^ 


e,  R 


2p  +  p2  +  (4+2p)eT(nQRE-i)e  +  (e'r(nQRE-i)e) 


(A.8) 


Now  it  suffices  to  show  E 


(e'^(ARi;-i)e) 


<  CO.    Applying  the  same  theorem,  we  have 


(g'^(ARE"^)6) 


=  E<E 


(e'^(ARE"^)e)   M,  R 
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{9  9         ~ 

2tr(i;~^P)    +  (trE^^P)    +     ■ 


4AljE"^PE"hp 


+  2A(trE~^p)lJ?"hp 


RM^  +  X^il^V'^lp)^^^^^  \  <  oc>  (A-9) 


using  the  marginal  indejiendence  of  M  and  R.    We  now  turn  to  (2.2.30). 


Equation   2.2.30:   As  c  -*  0, 


1  1/1 


2c   ItrE)^   W|,  -  E[R  ^|Anp] 


(2p) 


Proof:    We  will  break  up  the  l.h.s.  of  (2.2.30)  into  more  accessible  comjwnents.    First,  define 


Vn  =  e(r  ^|An)  =  dn^EJgs^i  +  S^2(M)+  a)/(np+b-2)]2|An  I,  (A.IO) 


where  dn^  =  rr(i(np+b-l))/r(l(np+b-2))1(l(np+b-2))   I    Now,  in  the  r.h.s.  of  (A.IO) 
xpand  Us^^+S^2(^)+ay{np+h-2)J  =  (E(R-Vn,  M))^  about  Wq  =  E(R-l|An)  = 
S   ^  +  E[Sjj2(^)l^n)  +  a  )/(np+b-2)  in  both  one-  and  two-term  Taylor  series,  to  obtain 


we  e: 


dnVn  =  W|  +  iE|cn'[E(R-l|An,  M)  -  E(R   Vn)]An  l 
=  wi  -  lE|^n^[E(R-l|An,  M)  -  E(R-l|An)]  JAn 


(A.ll) 


(A.12) 


where    Cxx    and    ^n    are   both    between    E(R   ^|An,    M)   and    E(R   ^|An),   and   are   not   An 
measurable.    Now  write 
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c   ^(trE)5(w|,-V^) 


1  ,      1 


(trEWw|>-d^V^) 


+  E 


c    2(trS)2(d^-l)Vj 


(A.13) 


We  first  show  that  the  first  term  on  the  r.h.s.  of  (A. 10)  goes  to  0  with  c.  We  will  use  (A. 12) 
to  establish  pointwise  convergence  of  the  integrand,  and  (A. 11)  for  uniform  integrability.  To 
that  end,  using  (A.12),  and  noting  the  cancellation  in  E(R     |An,  M)  and  E(R     |An), 


0  <  c 


1  11 


(trS)     (    YYnn— UnpVnp 


W4^-dT,Vr 


) 


(trE)^EU^^^E(R-l|A^,  M)  -  E(R~1|A^)]  |a^ 


1  1       f        3 


_i  1  _2    f  _3p  ,-|2 

=  Ic   2(trE)2(Tp+b-2)     E^T    St2(M)  -  e(s^2(^)|At) 


<  K(Tp+b-2)     U;^2E{s|,2(M)|Arp} 

<  K(Tp+b-2)~ V^^j  (x!f  X^)    +  E(M'*|A^) 


0  a^.  as  c  — ♦  0, 


(A.14) 


_1  _1  _3  _3 

where        we        have        used        the        facts        that        c     T         <    Urp  ,        ^rp      <    KUrp  , 


S.p2(M)  -  e(s^2('*^)I^t) 


<  E  s4^2(^^)|At-  ,  and  an  inequality  similar  to  that 
of  (A. 6).  To  obtain  uniform  integrability,  we  need  (A.ll)  to  stay  within  our  moment 
constraints.    We  have 
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E 


-1         I,     1 
2(trE)^fW^-d^V.p 


-iEic   ^(tri;)^E 


Crp^(E(R"VT'  ^)  -  E(R"^|At)) 


<Ke|c   ^(trS)^(Tp+b-2)    ^(s^j/(Tp+b-2)y 


X  E 


|E(s^2('^)i^T)  "  St2(^) 


<Ke|(Sti/(T-1)p)      e(St2(^)I^t' 


<  KE-^   sup 

[n>nQL 


(S^j/(n-l)p)       E(s„2(^)l^n) 


<    CO, 


(A.15) 


where     we     have     used     the     facts     that     c     T~     <  f  Srp,  /Tp+b-2  j  and      C-p    < 

[Srp^  /Tp+b-2 ]   ^,  and  the  last  line  is  justified  via  the  same  argument  as  that  of  (A. 7). 

To  handle  the  second  term  we  need  to  know  the  behavior  of  (dn-1)-    Observe  that 


dn  =  (  ^(np+b-2)  )  tI  i(np+b-2)  )  /  FI  i(np+b-l) 


(A.16) 


From  Lemma  1  of  Alvo  (1977)  we  can  obtain  two  facts: 


z2r(z)/r(z+l)  =  1  +  (8z)-i  +  Oe(z-2), 


(A.17) 


for  large  z  (>0),  where  Oe  denotes  exact  order,  and 


1 

1  <  z2r(z)/r(z+l)  <  1  +  (4z) 


-1 


(A.18) 
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_  1 


Putting  z  =:  ^(np+b-2),  we  get  the  following: 


and 


Since 
have,  easily 


{%'V^'-' 


dn-1  =  (4np)       +  Oe(n   2) 


0  <  dn-1  <  [4(np+b-2)] 


(A.19) 


(A.20) 


Vqo  =  R      >  is  a  uniformly  integrable  martingale,  with  (A.19)  we 


1  1 


-1 


c   2(tri:)^(d^-l)V^  ^  (4p)    '  a.s.  as  c  ^  0. 


(A.21) 


With  (A.20)  we  have,  by  arguments  similar  to  those  for  (2.2.31),  uniform  integrability,  since 
ER~  <  00.  These  results,  combined,  yield  (2.2.30).  Finally,  we  establish  the  validity  of 
(2.2.33). 

Equation   2.2.33:   As  c  -+  0, 


■lT-l({trE} 


11         1 


Proof:    Note  that  it  is  sufficient  to  show 


1     1  1 


E|c"lT~V(trE)%|,  -  Ut)^}  -*  0  as  c  ^  0, 


and 


E 


h~^T~'^(v?^  -  cHy\  ^  0  as  c  ^  0. 


(A.22) 


(A.23) 


We  begin   with   (A.22).      We  will  find   the  following  elementary   inequalities  useful.      For 
a,  b  >  0, 


Using  (2.2.20),  (A. 25),  and  the  fact  that  (tri;)W:3.  >  U^  >  0  a.s.,  we  obtain 
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(a^-b^)     <b-l(a-b)^  (A.24) 

and 

(a^  -  b^)     <  |a  -  b|.  (A.25) 


11  1    2 


2 

<  TU jM  (trE)^w|,  -UM 


<  TU;^2({tri;)W.[,  -  U^)^ 
^e(St2(M)|At)  +  aN^ 


/E(s^,(M)|A^).aN^ 

^^'    1^     Sti/(t-i)p      )■  ^      ' 


But  T"^— 0  a.s.,  Srfj/(T-l)p     -^  R~^  a.s.  and  E(S^2(^)I^t)  *^  ^P^^^  ^  '^~'"'  '^^"^^ 
c~^T"■^((t^E)^wl^-U^;^)     -^  0  as  c-*0.    To  establish  uniform  integrability  we  apply  (A.25): 


c-^T-l((trE)^  W^  -  U^^ 


11  1    2 

<  TU;^V(tr?)^W^  -  Uy 


TU:^^|(trE)W^  -  U^l 


63 


+  al 


^^P(St2(m)IAt) 

Sti/(t-i)p      j 

<  K^   sup       ^  "^,,      .,     ^  +    sup  (S„i/(n-l)p 


<  K< 


-r 


(A.27) 


The   arguments   for    (2.2.31)    can    be   applied    to   show    that    the   expression    is   integrable. 
Dominated  convergence  completes  the  justification  of  (A. 22). 

To  establish  (A. 23),  note  that  from  (2.2.20)  we  can  obtain 


1111  111 

Jrp        tt2     _  „2     ,    „2/rr     1N        tt2     ^   ,2     ,    it2 


0  <  c^T  -  U|,  =  c^  +  c^(T-l)  -  U;^  <  c^  +  V^_-^  -  V^ 


(A.28) 


Hence,  using  (A.28), 


E 


^         ^     2 
c~^T~^(u|>  -  c^T) 


<  E 


2c"^T"^ 


1  -2 


+  (u^_i  -  uy 


<  E 


2T~^    +    2c~h~^\]:^^(\]rj._^    -    Vrj.) 


{• 


-2/ 


<  2E(T"-')  +  2El  TU^''(Uy_^  -  U-j) 


(A.29) 


The  dominated  convergence  theorem  can  be  applied  to  show  that  E(T     )  ^  0,  so  it  remains 


to  consider  the  second  term  in  the  r.h.s.  of  (A.29).   To  this  end,  note 


|Un-l  -  Unl 
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<  (tr?) 


<  (tr?) 


P^n-l,  i((n-l)P+b-2)     (np+b-2)    V(YTi;-lYn)(i.p+b-2) 


pS^j((n-l)p+b-2)     (np+b-2)"V(YTE-lYn)(nP+b-2)"^ 


(A.30) 


He 


n(U^_l  -  Un)^Un2 


<  n(p((n-l)p+b-2)      +  (yTs-1y„)s-; 


<  2n   p2((n-l)p+b-2)       +  [Y'^^-hn)  S"^ 


<  Kjn-1  +  K2(n-1)   ^(yJs'^Yn)  [s^i/i^-l)) 


(A.31) 


,-ln 


2  -2 

<  K^n-l  +  K2{(n-l)''f:  (y?'(RE-1)Yj)  }(RS„i/(n-l)) 


i=2' 


<  K^n^l  +  K2I  ^s^p  Un-1)    ^J  (Y^^lRE-^Yi) 


^sup  (RSj^i/(n-l)) 


(A.32) 


2      _9    p 
Equation  (A.31)  can  be  used  to  show  that  T(Urp    ,  -  Urp)     Urp"  — >  0  as  c  — >  0,  noting  that 


the 


.Tv-1 


e  Y.  S      Y-    (i  =  2,...)   are   identically  distributed.      The   Schwarz   inequality   applied   to 
(A.32)      provides      an      integrable      dominating     function      when      (un  -  l)p    >    9,      since 

|(n-l)"   X)  (Yi^(R-?^^)Yi)  }     and   (RSj^2/("^l))        ^^^  backward  submartingales.     Thus 

i=2 
(A.23)  follows  from  (A.29)  -  (A.32),  and  so  (2.2.33)  is  established. 


APPENDIX   B 
DETAILS   OF   THEOREM    2.2 


Here,  we  provide  the  details  necessary  to  complete  the  arguments  of  Theorem  2.2. 
We  need  a  preliminary  lemma. 


Lemma   B.l.      Suppose  N   is  the  Bayes  stopping  rule  for  the  given  sequential  estimation 
problem.     Then  there  exists  a  number  B  >  0  such  that 


Wj^  <  BcN^ 


(B.l) 


Proof:   Since  N  is  Bayes,  on  the  set  [N=n]  we  must  have,  for  any  k. 


iiQ-Wii  iVk 


|An 


+  c(n+k)  >E[|ie-^nllVn] 


+  en. 


(B.2) 


That  is,  the  immediate  stopping  risk  must  be  less  than  the  expected  risk  of  taking  any  fixed 
number  of  additional  observations.    Note  that 


{i 


El  lie  -  inW    |An 


=  (trGn)E[R-l|An]  +  (trHn)V(M|An),  (B.3) 


A^tr 


ere        trGn    =    tr(nE        +    AD      j  is        decreasing        in        n,        and        trlJn    = 

(GnP-hplJo-lgn) 


In  view  of  (B.3),  (B.2)  translates  to 
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(trG„^k)E[R-l|Aj  +  (trH^^i^)E[v(M|A„^k)|A„]  +  c(n+k) 


>  (trGn)  E[R"Vn]  +  (trHn)  v(M|An)  +  en. 


(B.4) 


Rearranging  (B.4),  and  using  the  fact  that  V(M|An)  is  a  positive  supermartingale,  gives  us 


(trGn  -  ^^Qn+k)  E(R"^|An)  +  (trHn  -  trH^^j^)v(M|An)  <  ck.  (B.5) 


Note  that 


?n+k  =  (("+»^)?"^  +  ^?"^) 
=  (GnUki:-l) 


-1 


=  Gn-Gn(Gn  +  k-l?)     Gn 


(B.6) 


-1       T    -1 
Substituting  in  (B.5),  we  obtain,  writing  A  =  D     Iplp  P     , 


tr 


Gn(Gn  +  k-^E)     Gn 


E(R-^|An) 


+  A 


'tr|2GnAGn(Gn  +  k~^E)     Gn  -  Gn(Gn  +  k    ^e) 


xGnAGn(Gn+k-lE)     Gn[v(M|An) 


<  ck. 


(B.7) 
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Using  Gn  =  n    ■  Fn,  we  further  obtain 


tr 


Fn(Gn+k-l?)     Fn  E(R-Vn) 


+  AVltr|2FnAFn(Gn  +  k'^v)     In  -  Gn(Gn  +  k    ^e) 
xFnAFn(Gn  +  k-^?)     Fn}v(M|An) 


<  ckn^. 


(B.8) 


We  want  to  argue  that  the  second  term  on  the  l.h.s.  of  (B.8)  is  nonnegative,  and  hence  that  it 
can  be  dropped  from  the  inequality.    Recall  that  k  is  arbitrary.    Note  that 

tr|2FnAFn(Gn  +  k"!?)     F^  -  Gn(Gn  +  k"!?)     FnAFn(Gn  +  k"!?)     Fn  I 


tr(2ki;Ai;)  as  n  — +  oo,  for  fixed  k 


(B.9) 


(2n  -  l)tr[FnAFn  j  as  k  -+  oo,  for  fixed  n. 


(B.IO) 


Both  of  these  expressions  are  nonnegative.    It  follows  that 


tr 


Fn(Gn  +  k"^?)     ?„   E(R-^|An)  <  ckn 


(B.ll) 


for  all  n  larger  than  n  ,  say,  and  k  =  1.    For  1  <  n  <  n  ,  we  can  choose  values  ki,...,  k   ,, 

'■  n 

say,  such  that  (B.ll)  holds.    We  obtain 


min    jtr  FnfGn  +  k.-lE)    ^Fn    lE(R^l|An)  <  (cn^)     max      {k;} 
<kj<n'[    L      ^  1        ^  JJ     V  /  0<kj<n' 


(B.12) 


where  k^  =  1.     Since  tr[Fn(Gn   +  k"^?]      Fn  )   -»  k  tr^EAs)   >   0  for  each  fixed  k,  as 
n  — ♦  oo,  we  have 


E(R"Vn)  <  Bcn^  (B.13) 

for  some  B  >  0,  on  the  set  [N  =  n].    This  is  sufficient  for  the  lemma. 

We  can  now  establish  (2.2.37)  -  (2.2.41).    Note  that  (2.2.37)  is  immediate,  since  the 
integrand  is  a  uniformly  integrable  martingale.    With  regard  to  (2.2.39),  we  have 


c-%-2Wj^  <  BWj;jlWj^  =  B,  (B.14) 


applying   the   lemma;    the   result   follows   via   the   Dominated    Convergence   Theorem.      For 
(2.2.40),  we  have 


c-1n-2v(M|Aj^)  <  BW^^E(m2|Aj^) 


<  be(r|Aj^)  e(m2|An) 


<b\   sup  E(R|An)U   sup  E(M2|An)  k  (B.15) 

[n>nQ  J  [n>nQ  J 


using  Jensen's  inequality  and  Lemma  B.l.     The  fact  that  the  r.h.s.  of  (B.15)  is  integrable 

2 
follows  from  the  Schwartz  inequality  and  Doob's  maximal  inequality,  since  E(R  |An)  and 

E(M    |Aji)  form  uniformly  integrable  martingales,  and  hence  have  integrable  last  elements. 

Equation  (2.2.40)  follows  via  the  Dominated  Convergence  Theorem.    Now,  for  (2.2.38),  using 

(A. 8),  arguments  in  (A.  12)  and  Jensen's  inequality, 
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1       1 

2 


1  ,       1 


{K  -  Vn)  =  ^"'K  -  ^N^n)  +  ^''(^N  -  1)V 


N 


1  1,1 


1  1 


<  B%Wj^^(w^  -  dj^Vj^)  +  B5NWj;^^(dp^  -  1)Vn 


<  K 


<  K 


<  K 


1  1 


E   Wn'(^n)"'[e(^N2(M)|An)  -  Sn2(M)] 


An    +  ^  n'Vn 


'Nl 


,(N-1)P, 


e(Sn2('^)|An)  +  1 


n>j{(^)       K^n2(M)|An)+l 


(B.16) 


which  is  integrable,  via  the  arguments  of  (A. 7)  -  (A. 9).     Again,  we  can  apply  dominated 
convergence  to  obtain  the  desired  result.   To  establish  (2.2.41),  note  that  for  all  c. 


0  <  c~^Ie 


Q-Orj.1  +    CT 


-  E 


\6-i^f  +  cN 


=  E 


2c   ^(trE)^jwi,-E(R  ^|Arp) 


1  1/1 


2c  ^(trS)2    W2^-e(r  2|Aj^) 


+  Erc"^T~\trFrp  -  trE)W  J  -  Erc~%"^(trFj^  -  trE)Wj^1 


+  E[c'^T~2A2tr(F^GFY)V(M|A^)]  -  E[c"%"2^2jj(p^gp^)y(j^|^^^)'j 


+  E 


c~^T"M  (trE)%|,  -c^T 


o(l)  -  E 


/  111 

c"%"M  (trE)^W^  -  c^N 


/  11         1    "■ 

c"%~^    (trS)^W|j  -  c^N 


as  c  — +  0, 


(B.17) 


since  N  is  Bayes,  and  Theorem  2.1  holds.    It  follows  that  E 

o(l)  as  c  — +  0  also;  i.e.,  (2.2.41)  holds,  and  thus  Theorem  2.2  is  proved 


/  11  1     ^ 

c~%~M  (trS)2W^  -  c^N 


APPENDIX   C 
DETAILS   OF   THEOREM   2.3 


Under  the  conditions  of  Theorem  2.3, 


,-1 


"X  ~  -T 


Atr|?(P"^-C)E|/tr(E) 


A2|tr(EGE) /tr(E)}  E[RV(M|e,  R)]  as  c  -»  0.  (C.l) 


Proof:    We  first  establish  jX)intwise  convergence  of  the  integrand.    Recall 


—  1 
9n  =  UV~^  +  AD"^)     (nE~^Xn  +  Ap~hpE(M|An)) 


and 


-1 


^n  =  (nE~^  +  AC)     nE'^Xn, 


-1 


here  C  =  D"^  -  (lpP~hp)     P"hpl^p~^    Hence,  for  n  >  1, 


l^n-^nl 


(ni;~^Ac)     (nE"l)Xn  -  (nE"^Ap)     {(ni;"MXn  +  Ap~^E(M|An)lp} 
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=    Xn-(n?~^Ac)      {AC)Xn  -Xn  +  (n?-^Ap-l)      (Ap-^){Xn  -  E(M|An)lp}| 
=  a2  {(nE-^Ap-1)     p-1  -  (nE-^Ac)     c}(Xn  -  E(M|An)lp)|  ,  (C.2) 


since  Clp  =  0.    Hence,  using  (C.2), 


c      J2'~p~P'p| 


=  aV^t~2 


{(E"^AT"lp"^)     p-1  -  (e-^AT-^c)     c}(x^  -  E(M|A^)lp)| 


(C.3) 


We  have,  via  arguments  in  Lemma  2.2,  and  the  continuity  of  matrix  inversion,  that 


(X^  -  E(M|A^)lp)  -  (e  -  E(M|0,  R)lp); 


(C.4) 


P"^(i:~^AT~^P~^)      -  c(e~^AT"^c)       -^  P"^  -  C   a.s.  as  c  -^  0.       (C.5) 


Using  these  and  Lemma  2.1,  we  have 


^"l^T'^xf  ^  ^^(ti-?)"^R|?(P~-^-C)(e  -  E(M|e,  R)lp) 


=  4"  (say),  a.s.  cis  c  — ^  0. 


(C.6) 
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At  this  point,  $  does  not  look  like  the  right  expression.    We  first  show  uniform  integrability 
of  ]c~  j^rp-^rpr,  c  >  oK  and  the  indicate  how  E($)  is  correct. 

Using  (2.2.14),  (C.3),  and  basic  projserties  of  the  natural  matrix  norm,  it  can  be 
shown  that 


C      12'T'~2t 


-1  -1        P  _ 

<  A2u;^M{(i:"^AT"^D~n     P"^  -  (e'^AT'^c)     c}    |X^  -  E(M|A^)lp 


<    KU;^%rp    -    E(M|Arp)lp"^ 


<  k{(Sti/(T-1)p)    |XtP  +  (St^/(T-1)p)    e2(m|At) 


(C.7) 


Uniform  integrability  follows  via  arguments  similar  to  those  used  to  establish  (2.2.31).    f  See 
(A.7)). 

We  can  now  look  at  E($).  It  is  important  to  note  that  in  its  current  form,  E(0), 
obviously  nonnegative,  demonstrates  that  there  is  a  cost  associated  with  using  (T,  dj,)  versus 
(T,  ^rp).   This  is  not  so  apparent  in  (2.3.14).    We  have 


E($)  =  E 


A2(tri;)     R|E(P"^-C)(e  -  E(M|e,  R)lp 


=  A(trE)     E   AR^e  -  Mlp  +  Mlp  -  E(M|e,  R)lpJ 


x(p~^-C)i;E(p~^-C)('e  -  Mlp  +  Mlp  -  E(M|e,  R)lp) 
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A(trS).     I E 


\T 


XR(e  -  Mlp)    (P"^-C)E2(D"^-C)(e  -  Mlp) 


+  E   AR^Mlp  -  E(M|e,  R)lp)    (p~l-C)S^(D~^-C)(Mlp  -  E(M|e,  R)lp) 


+  2E 


AR(e  -  Mlp)    (p~l-C)s2(p~^-C)(Mlp  -  E(M|e,  R)lp)    I.  (C.8) 


We  examine  the  three  expectations  in  (C.8)  individually.    First, 


E 


T 
AR(e  -  Mlp)    (p~^-C)E2(p-l-C)(e  -  Mlp) 


=  E^  ARE 


rp 

(e  -  Mip)  (p"^-c)E^(p-i-c)(e  -  Mip) 


M,  R 


=  E|ARtr((AR)    ^p(P"^-C)i;2(p"^-C)U 
=  tr|E(p"l-C)p(p~l-C)s} 


=  tr{E(p~^-C)E}, 


(C.9) 


where  we  have  used  the  fact  that,  given  M  and  R,  0  has  mean  vector  mlp  and  covariance 
matrix  (Ar)      p.    Second, 


eJar(m1p  -  E(M|e,  R)lp)    (p-^-C)E2(P"l-C)(Mlp  -  E(M|e,  R)lj 


eJaRE   (Mlp  -  E(M|e,  R)lp)    (P"^-C)E2(p-l-C) 


X  (Mlp  -  E(M|e,  R)lp) 


6,  R 


=  E[ARV(M|e,  R)lp(D~^-C)E2(P"^-C)lp] 


=  Atr(E(P"^-C)lplJ(p^l-C)E)  E[RV(M|e,  R)] 
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=  Atr(EG?)  E[RV(M|e,  R)], 


(CIO) 


where,  recall,  G  =  D    hplpP"^    Finally, 


AR(e  -  Mlp)    (P"^-C)E^(P~^-C)(Mlp  -  E(M|e,  R)lp) 


=  E   AR(e  -  E(M|e,  R)lp  +  E(M|e,  R)lp  -  Mlpj 


X  (P"^-C)s2(p-l-C)(Mlp  -  E(M|e,  R)lp) 


eJaRE   (e  -  E(M|e,  R)lp)^(p"l-C)E2(P"^-C) 


X  (Mlp  -  E(M|e,  R)lp) 


e,  R 


eUre 


T 

(Mlp  -  E(M|0,  R)lp)    (P"'1-C)E-(P"1-C) 


X  (Mlp  -  E(M|e,  R)lp) 


0,  R 


(C.ll) 


=  -Atr(EGE)E[RV(M|e,  R)], 


(C.12) 


where  the  first  term  of  (C.ll)   is  identically  0,  and  the  value  of  the  second  follows  from 
(CIO).    Using  (C9),  (CIO),  and  (0.12),  we  find  that 


75 
0  <  £($) 

=  Atr(E(P"^-C)?)/tr(i:)  -  A2|tr(EGE)/tr(E)}E[RV(M|e,  R)], 

and  the  theorem  is  proved. 


APPENDIX   D 
DETAILS   OF   THEOREM   2.4 


Under  the  conditions  of  Theorem  2.4, 


c-1e 


C7  rp  — J\.  rr\  | 


Atr(i:p"^i;)/tr(E) 


-  A2/tr(SGi;)/tr(i;)}E[RV(M|e,  R)]  as  c  -*  0.  (D.l) 


Proof:    We  first  establish  pointwise  convergence  of  the  integrand.    Recall  that 


^n  =  (nS^^AD"^)     (nS"^Xn  +  Ap~^lpE(M|An)) 


I  -  n~^E(n~li;+Ap~^)       Xn  +  (nS^^AD'M     Ap"hpE(M|An).  (D.2) 


Hence 


C        I     T^^-^T^ 


T"^I;(t'^^S+A"^P)     Xrp  -  T"^(S;"l+AT~^p"n     AD"hpE(M|A^) 
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-1 


=  A^c'^T"^  s(aT"^E+P)     X^  -  (E-^AT^lp-^)     p"^lpE(M|A^) 


A2(trS)     RlED"Ve  -  E(M|0,  R)lp)r  a.s.  a^  c  -^  0, 


(D.3) 


using  (as  in  Theorem  2.3)  Lemma  2.1,  arguments  in  Lemma  2.2,  and  the  continuity  of  matrix 
inversion.  Uniform  integrability  follows  from  the  following  inequality,  similar  to  (C.7),  and 
arguments  like  those  establishing  (2.2.31): 


c"^|e^-x^|^  <  KA2u;^^i|x^|^  +  e2(m|a^)1 


<  k|(Stj/(T-1)p)     fX^f  +  (Sti/(T-1)p)     e2(M|At) 


(D.4) 


It  remains  to  show  that 


—  1         H 

A2(trE)     R  SP"Ve  -  E(M|0,  R)1j 


=  Atr(Ep~^E)/tr(?)  -  A2/tr(EG?)/tr(E)|E[RV(Mie,  R)].  (D.5) 


This  follows  from  arguments  virtually  identical  to  those  of  (C.8)  -  (C.IO)  and  (C.12).    Thus, 
the  theorem  is  proved. 


APPENDIX   E 
DETAILS   OF   THEOREM    3.1 


Here  we  prove  the  relationships  which  establish  Theorem  3.1.     We  start  with  two 
propositions  that  are  used  to  show  uniform  integrability. 

Proposition  E.l:    Under  the  regression  model  of  Section  3.2,  if  xir^k  >  2u,  then 


E 


sup  f  RS   1  /nk  +  b  -  2) 


'^"0 


<  00. 


Proof:    Using  Proposition  3.1,  we  can  write 


E 


sup  (RSjjj/nk  +  b  -  2) 


n>n 


=  E 


<    sup     — ^ — i —       E 


n>n(j 


sup  (rJz.  /n-  1 
n>noV    i=2     / 
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<  KE 


E^   sup     RX^Zj  /n  -  1 


ln>nr,V    i=2 


/?,  R 


<  KE 


<<0 


<  oo        when  nQk/2  >  u, 


where     we     have     used     Doob's     maximal     inequality     on     the     backward     submartingale 
{ERZ-Zn-l}. 


'i=2 


Note:       It    follows    easily    that    E 


sup  fs„i/nk  +  b  -  2) 


n>n(j 


<  00    under    the    additional 


constraint  that  b/2  >  -  u,  which  ensures  that  E(R")  <  00. 


Proposition  E.2:    Under  the  conditions  of  Theorem  3.1 


sup  (e{s„2(M)|A„}(s„i) 


n>nQ 


<  00. 


Proof:    Note  that,  using  Lemma  1.1, 


E[S„2(M)lAn](S^l)" 


E 


_(Mn^Pn-Mlpr(Ap-^Mn^)"\MnlPn-Mlp)(S^l)"Vn] 


=  E 


r-1 


1,,,_1-L,,_1 


(Mn'Pn-^+^-Mlp)    (Ap-^+M-^)     (Mn'Pn-^+^-Mlp)(S 


'nl)'Vn] 
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<  KE 


Mn'Pn-^l    +I^-Mlp|r    (Sj^i) 


An 


=  KE 


||R'^'(Mn^Pn-^)f  +  |R''''(^-Mlp)f|(RS^i)" 


Note  that  Mn^Pn"/?  =  Mn^EX?'v"^Y.-/?  =  Mn' 

1     1   - 


i:xJY"i(YpXj^) 


.    It,  follows  that 


1  H^  I  1  I 

R'(Mn^Pn-^)|     <  K  n-lR2  Jx^'Y'^CYi-X;^)    . 


So,  continuing,  we  have 


E[S^2(M)|An](S„i)-^ 


<  KE 


ln-lR'/'f:X?'Y"\Yi-Xi^)|    +  |R'/'(^-Mlp)||"l(RS^i)-^ 


A, 


Using  this  relation,  we  have 


sup  (E{s„2(M)|An}(S^i)-l 


<  K<^E 


/|l  jj  ||2 

sup  E     n-lR^/'EXi^y-lCYj-X./?)    (RS^i)"^ 
n>nn    VB  1  'I 


An 


+  E     sup  e(   |r'/'(^-M1p)|  (RS„i)-1 


n>n 


An 
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<  K<^E^ 


sup 
n>nf 


n-lR'/'f:xjrY-l(Yi-X.^)| 


sup  (RSjjj/n) 


n>nQ 


+  E 


1/2 


sup  |R'/'(^-Mlp)f 


n>ng 


.1/2 


sup  (RSj^j/n) 


-2 


n>n(j 


Via  a  slight  modification  of  Proposition  E.l,  E     sup  (RS    i/n) 


n>nQ 


<  oo  if  Uok  >  5.    Also, 


n>no 


E    sup    R''(^-Mlp) 


=  E<^E 


R'^'(^-Mlp) 


R,  M 


<  KE{  J3  E 
U=l    L 


L 


R,  M 


<   00, 


since,  given  R  =  r  and  M  =  m,  r^(/3--m)   ~  N(0,  A     d-.)  where  d--  is  the  j       diagonal 

element  of  D. 

1 
Finally,  if  we  write  R^X^'y^^Yj-Xj^)  =  Qj,  then  given  R  =  r  and  ^  =  ^,  Qj  ~ 

Np(0,  X-  Y     X-),  independently  and,  since  X^Y     Xn  ^  S     ,  the  components  of  the  Qj's 

have  bounded  moments  of  arbitrary  order.     Using  Proposition  3.2,  an  argument  similar  to 

that  above  for  |R^(/9-Mlp)i    gives  us 


E 


1-1  "      I 
1 


=  E<E 


Bin 


n>no 


/?,  R 


<  oo. 


This  establishes  the  prof>osition.    We  now  proceed  to  the  proofs  of  (3.2.13)  -  (3.2.17). 


Equation    (3.2.13):    For  all  c, 
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1  1,1 


1  1 


2c5(trE)^E(R~^|AT)    =  2c5(trE)5    t[^)  /  r(|)    (|)^ 


Proof:    This  is  immediate  via  the  optional  stopping  theorem. 

Equation  (3.2.16):    As  c  —  0, 

Erc"^T''2A2tr(FrpGFrp)V(M|Arp)1  -^  A2/tr(SG?)/tr(i:)}E[RV(M|^,  R)]. 

Proof:    From  Lemma  3.1,  the  behavior  of  Fn,  and  the  fact  that  T  — ►  oo  a.s.  as  c  — ►  0,  we 
obtain 


c~^T~2A2tr(F^GFrp)V(M|A^) 


A2{tr(i;Gi;)/tr(E)}RV(M|^,  R)  a.s.  as  c  -^  0. 


(E.l) 


The  result  will  follow  by  showing  that  the  l.h.s.  of  (E.l)  is  uniformly  integrable.  We  exhibit 
an  integrable  dominating  function.  Note  that,  using  (3.2.8)  and  (3.2.10),  and  the  fact  that 
tr(FiiGFii)  is  convergent  and  thus  bounded. 


c~^T"^A^tr(F^GF^)V(M|A^)  <  Ka2u^^E(m2|A^) 


-1- 


<  K  sup  |E(M2|An)(S   ,/nk  +  b  -  2)      j.  (E.2) 


n>no 


The        Schwarz        inequality        coupled        with        Proposition    E.l,        the        fact        that 
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<E  (M   |Ajj),  Hq  <  n  <  oo  V   is  a  submartingale,   and    Doob's  maximal   inequality   establish 
that  the  r.h.s.  of  E.2  is  integrable. 

Equation    (3.2.15):    As  c  -+  0, 


Erc~^T"^(tr(F:j,)  -  tr(E))Wrp 


^tr(?HE)/tr(i:)  -  Atr(Ep"^i:)/tr(E). 


Proof:    We  write 


c~^T~Vtr(F^)  -  tr(E))Wrp  =  tr|T(F^-i:)}c~^T~2w^. 

Lemma  3.1   and   the  fcict  that   ^Wj^,   W^...,   Wqo   =   R~    >   is  a  martingale  imply 

-1     -9  -1 

c    ^T   '^W-p  -^  (tr?)       a.s.  as  c  -*  0.    We  need  the  behavior  of  T(Fj-i:).    Note  that 


that 


n(Fn-?)  =  n|(An~^p~^n~^Mn)       -  ?} 

=  n|(nMn^-E)  -  nMn^(A~^nP+nMn^)"  nMn^} 


n(nMn^-E)  -  nMn^(A~^P+Mnb     nMn^ 


EHE  -  E(AP"^)E     asn  ^  oo, 


applying  Proposition  3.3,  and  the  fact  that  M^  -  nE       — *  H  as  n  — >  oo.    It  follows  that 


c"^T"^(tr(F^,)  -  tr(E)W^ 


-tr(EHE)/tr(E)  -  Atr(Ep    ^E)/tr(E)     a.s.  as  c  -^  0 


(E.3) 


It  remains  to  show  uniform  integrability.    We  have,  using  (3.2.10), 


■^T"Vtr(F:j,-E))w^  <  K(s^^/Tk  +  b  -  2)     e[r   Vj 


<  KE 


<  KE 


(RS^^/Tk  +  b  -  2) 


-1 


At 


sup  fRS„i/nk  +  b  -  2)    ^ 
n>n„^      "'  ' 


'^"0 


A. 
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(E.4) 


using  the  facts  that  ntr(Fn-?)  is  bounded,  and  that  fS-p^/Tk  +  b  -  2J  is  Ar^-measurable. 
The  last  expression  is  integrable  via  a  minor  adjustment  to  Proposition  E.l.  The  result  is 
proved. 

Equation    (3.2.14):   As  c-^  0, 


1  1,1 


2c   2(trE)^(w|,  -  E(R  ^[A^)) 


-(2k) 


Proof:    We  will  break  up  the  l.h.s.  of  (3.2.14)  into  more  accessible  components.    First,  define, 
as  in  Chapter  2, 


Vn  =  E(R   2|An)  =  dn'E 


(Snl  +  Sn2(^)  +  ^)/("P  +  ^ 


-,1 


An  (E.5) 


where  dn^  =  rr(i(np+b-l))/r(i(np+b-2))   (i(np+b-2))    I     Now,  in  the  r.h.s.  of  (E.5) 

we    expand    Rs^^^  +  Sj^2(^)  +  ^)A"P  +  ^  "  2)  '    =    (E(R"Vn,    M))'    about    Wn    = 
E(R~^|An)  =  (  Sjjj  +  E^S^r,(M)|An)  +  a  j /(np  +  b  -  2)  in  both  one-  and  two-term  Taylor 


series,  to  obtain 
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dnVn  =  W|  +  isi  Cn'[E(R-l|An,  M)  -  E(R-Vn)]  |An 

1  r  _3|_  9| 

=  W^  -  iEJ^elE(R"^|An,  M)  -  E(R-l|An)J"|An 


(E.6) 


(E.7) 


where    (n    and    ^n    are    both    between    E(R      |A,    M)    and    E(R      lAjj),    and    are    not    An- 
measurable.    Now  write 


E 


\tTY,f(w\-Vr^) 


=  E 


1  ,     1 


(trEj^^W^-d^V^) 


+  E 


1  1 


c   ^(tr5)''(d^-l)V^ 


(E.8) 


We  first  show  that  the  first  term  on  the  r.h.s.  of  (E.8)  goes  to  0  with  c.  We  will  use  (E.7)  to 
establish  pwintwise  convergence  of  the  integrand,  and  (E.6)  for  uniform  integrability.  To  that 
end,  using  (E.7),  and  noting  the  cancellation  in  E(R~  |An,  M)  and  E(R~  |An), 


1  1.1 


0  <  c   ^(trE)^(wi,-drfVrj,) 


_1  1       (    _3p  -i^l  1 

Ic   ^(trE)^E<^  ^x^l  E(R~  Vt'  M)  -  E(R~  VtHI^T  f 


|c   2(tri;)^(Tp+b-2)     E^T^  S^2('*^)  "  e(s^2(^)I^t) 


<  K(Tp+b-2)     U^2e|s|>2(M)|A-p} 

<  K(Tp+b-2)-lu;^2||^w|4  ^  ^^i\j^^^ 


0  a.s.  as  c  — ►  0, 


(E.9) 
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1 

2rp-l 


1  _3 

2 


where    we    have    used    the    facts    that    /?rp    — ^    /?    a.s.,    c     T         <    U^p",    ^^p      <    KUrp", 


Srp2(M)  -  e(s^2(M)I^t)  I  I^T 


<  E  S4^2(^)I-^T  '  ^"^  ^'^  inequality  similar  to  ones 
from  Proposition  E.2.  To  obtain  uniform  integrability,  we  need  (E.6)  to  stay  within  our 
moment  constraints.    We  have 


E 


.11.1  , 

^(trE)^fW^-dp,VrpJ 


1  1 


-^eIc   ^(tri:)^E 


Crp^(E(R~VT'  M)  -  E(R~^|A^)) 


f  _i  1  _i,  .-1 

<KE^c    2(trE)2(Tp+b-2)      f Sjj/(Tp+b-2)j    ^ 


X  E 


|E(Sp,2(M)|A^)  -  S^2(M) 


<ke|(Sti/(t-i)p)    e(St2(m)|At' 


<Ke(sup     (S^i/(n-l)p)     E(s^2('^)|An)_ 


<  oo, 


(E.10) 


_1  -1  -1 

where   we    have    used    the   facts    that    c     T~      <    K(Srp,/Tp   +    b    -    2j        and    Cf     < 

(Srpi/Tp  +  b  -  2)     ,  and  the  last  line  is  justified  via  Proposition  E.2. 


Equation    (3.2.17).    As  c  -►  0, 


^T~^((trE) 


i     ^         i    n2 
^W^  -  c^TJ 


Proof:    Note  that  it  is  sufficient  to  show 
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11  1      9 


^T~V(trE)%|,  -  U^"  i  ^  0  as  c  ^  0  (E.ll) 


and 


1         1 


■1t~^(u|  -  c^t)    I  -*  0  35  c  ^  0.  (E.12) 


We  begin  with  (E.ll).    Arguing  as  in  Appendix  A,  we  obtain 


11  1    2  11  1    2 

c"^T"M(trS)2W^  -  Uy    <  TUj^((trE)^W^  -  U^ 


,  KT-'(<!lg^)'.  ,E..3, 


Now,  T   ^  -^  0  a.s.,  S^j/Ti£  -+  R   ^  a.s.  and  ErSrp2(M)|A^)  is  Op(l)  as  c  ^  0.    It  follows 

11  1    2 

that  c     T     ((trE)  Wrp  -  Urp  j     — +  0  as  c  — >  0.    To  establish  uniform  integrability  we  apply 

1       1,2 


th 


e  inequality    (a^-b^j     <  |a-b|,    to  obtain 


11  1.2 


^T~l((tri:)^w|,  -  U|,)    <  TU:^^|(trE)W^  -  Urp| 


<  K      sup      ^  X    ,   ..      >  +    sup  (S„i/nk)       \.  (E.14) 

ln>nn        S^lM  n>n.^   "^        '      J 


Propositions  E.l  and  E.2  can  be  applied  here,  with  slight  modification,  to  show  that  the  r.h.s. 
of  (E.14)  is  integrable.  Dominated  convergence  completes  the  justification  of  (E.ll).  To 
establish  (E.12)  we  apply  the  stopping  rule  based  inequality  as  in  Appendix  A.  Identical 
argument  gives  us  that 
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1t.-1 


c~'T 


<  2E(T   ^)  +  2E 


Tu;;.-(u 


T  I'-'T-l 


-%)' 


(E.15) 


Dominated  convergence  can  be  applied  to  show  that  E(T    ^)  -+  0,  so  it  remains  to  consider 

the    second    term.       Recall    that    S^j    has    the    representation     ^  Z-,    where    the    Z.'s    are 

i=l    '  ^ 

conditionally  independent.    Using  this,  we  have 


Un-l-Vj| 


(trE)   kSjj_j^j((n-l)k+b-2)     (nk+b-2)    ^  +  Zn(nk+b-2)    ^ 


<  (tr?; 


kSj^j((n-l)k+b-2)     (nk+b-2)       +  Zn(nk+b-2) 


(E.16) 


Hence 


^^n'^i^n-r'^4 


<  nf  k((n-l)k+b-2)      +ZnS^}j 

<  2n|  k2((n-l)k+b-2)    +  Z^S'J 


-1 


<  K^n-l  +  K2(n-1)"  z2(s^^/(n-l)) 


-2 


(E.17) 


<  K^n-1  +  K2|(n-1)~^   J    (RZj)n(RS„^/(n-l)) 


_2 

<  K^np  1  +  kJ  sup  (  (n-1)"^   J    (RZj)^  )  sup  (  (  J  RZj)  /(n-1))      | 
[n>nQV  i=2  /n>nQVM=2      'V  J     j 


(E.18) 
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—9  2    P 

Equation  (E.17)  can  be  used  to  show  that  TUj  (Urpj-Urp)     ^  0  as  c  ^  0,  noting  that  the 

Zj  (i  =  2,  3,...)  are  identically  distributed.    The  Schwarz  inequality  applied  to  (E.18)  provides 

f  -1"  21^ 

an   integrable   dominating   function    when    (nf,-l)k    >    9,    since    <  (n-1)       T"  (RZ-)    >      and 

r  -in         ]-^  I  i=2       '     i 

<(n-l)      ^  RZ.  >        are   backward    submartingales.      Thus    (E.12)    follows   from    (E.15)    - 

I  >=2      'J 

(E.18)  and  so  (3.2.17)  is  established. 


APPENDIX   F 
DETAILS   OF  THEOREM    3.2 


Here,  we  provide  the  details  necessary  to  complete  the  arguments  of  Theorem  3.2. 
We  need  a  preliminary  lemma. 

Lemma    F.l:     Suppose  N  is  the  Bayes  stopping  rule  for  the  regression  parameter  estimation 
problem.     Then  there  exists  a  number  B  >  0  such  that 


Wj^  <  BcN' 


(F.1) 


Proof:    Since  N  is  Bayes,  on  the  set  [N=n]  we  must  have 


ll^-^n+kll   IVk 


|An 


+  c(n+k)  >  E 


11/?  -  ^nll    |An 


+  en. 


(F.2) 


Recalling  (3.2.7),  (F.2)  translates  to 


H,+k)E[R"'|An]  +  (trH„^j^)E[v(M|A^^i^)|A; 


+  c(n+k) 


>  (trGn)  E[R-l|An]  +  (trHn)  v(M|An)  +  en 


(F.3) 


where  trGn  =  tr(Ap      +Mn)       is  decreasing  in  n,  and  trHn  =  A"tr   (GnP      IplpD      Gn)   • 
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fl 


n+k 


T,,-l, 


The  remainder  of  the  proof  is  identical  to  that  of  Lemma  B.l,  substituting    J^  X-  V      X-  for 

n+r'  ■    ■' 

kE-1. 

We  can  now  establish  (3.2.19)  -  (3.2.23).    Note  that  (3.2.19)  is  immediate,  since  the 
integrand  is  a  uniformly  integrable  martingale.    With  regard  to  (3.2.21),  we  have 


c-1n-2Wj^  <  BWj^%j^  =  B, 


(F.4) 


applying  Lemma  F.l;  the  result  follows  via  the  Dominated   Convergence  Theorem.      For 
(3.2.22),  we  have 


c   %   2v(M|Aj^)  <  BWJ^^E(m2|Aj^) 


<  be(r|Aj^)  e(m2|An) 


<  B^   sup  E(R|An)^<^   sup  E(M^|An)^, 
ln>nQ  J  [n>nQ  J 


(F.5) 


using  Jensen's  inequality  and  Lemma  F.l.  The  fax:t  that  the  r.h.s.  of  (F.15)  is  integrable 
follows  from  the  cauchy  Schwartz  inequality  and  Doob's  maximal  inequality,  since  E(R  |An) 
and  E(M  (An)  form  uniformly  integrable  martingales,  and  hence  have  integrable  last 
elements.  Equation  (3.2.22)  follows  via  the  Dominated  Convergence  Theorem.  Now,  for 
(3.2.20),  using  (E.8),  arguments  in  (E.12)  and  Jensen's  inequality. 


1  ,      1 


1  ,     1 


«="'(W?J  -  Vn)  =  c"5(wij  -  d^y^)  +  c'\6^  -  1)Vn 


<  B'NW^'{wi^  -  dj^V^)  +  B^NWr,5(d^  -  1)V, 


'N  ^"N  -  ^^''N 


<  K 


1  1 


E  Wn'(^n)~'[K^N2(m)IAn)  -  SnsW 


An    +  Wj^^v^ 
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<  K 


<  K 


-1 


'Nl 


.(N-l)p 


e(Sn2(^1)|An) 


+  1 


,^->"n    rfVp         E(Sn2(M)|An)  +  1 


n>nQU("-l)P 


(F.6) 


which   is  integrable,   via  Proposition   E.2.     Again,  we  can  apply  dominated  convergence  to 
obtain  the  desired  result.   To  establish  (3.2.23),  note  that  for  all  c, 


0  <  c"^iE 


e-^T    +^T 


E 


e-e^l 


+  cN 


=  E 


-1  1/     ^  -^  ^ 

2c   2(trE)^   W|,-e(r   2|a^) 


1  1/1 


2c   ^(trE)2   W2j-e(r   2|Aj^,) 


+  e[c   ^T   ^(trFrp  -  trE)W  J  -  Erc~%"l(trFj^  -  tri;)Wj^Tl 


+  e[c   ^T~2A2tr(F^GF^)V(M|A^)]  -  E[c"%~2A2tr(Fj^GFj^)V(M|Aj^)] 


+  E 


1     1         1 


c'h-M  (trE)%|,  -  c^T 


-  E 


,-Kt-1 


1     1         1 


=  0(1)  -  E 


1        1  1 


c'^N"-'    (trEj'W^  -c-'N 


c'^N"-'    (trE)'W^  -  c 


as  c  — +  0, 


N 


since  N  is  Bayes,  and  Theorem  3.1  holds.    It  follows  that  E 

0(1)  as  c  — >  0  also;  i.e.,  (3.2.43)  holds,  and  thus  Theorem  3.2  is  proved 


(F.7) 


1     1  1 


c"%"M  (trE)^W^  -  c^N 


APPENDIX    G 
DETAILS   OF   THEOREM   3.3 


Here,  we  establish  that,  under  the  conditions  of  Theorem  3.1, 


c"1/?-/?^ 


Atr|x:(p~^-C)s}/tr(i;) 


-  A2{tr(i;Gi;)/tr(i:)}E[RV(M|^,  R)]  as  c  ^  0.  (G.l) 


Proof:    We  first  establish  pointwise  convergence  of  the  integrand.    Recall 


^n  =  (Mn+AP    ^)    ^(Pn  +  AE(M|An)p-hp 


/5n  =  (Mn+AC)     Pn, 


whereC  =  p    ^  -  (iJd    hp)     D    hplpP"'^   Then,  recalling  that  Mn^Fn  =  ^n 


/?n-/?n 


(Mn  +  AC)    \SnPn   "  (Mn  +  Ap-^)    \m„0J(  +  Ap-lE(M|An)lp 
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=  jPn  -  (Mn+AC)    Vn^S'  -  ^n  +  (Mn+AD"!)    \\D~^)[pJ  -  E(M|An)lp)| 

=  A2|{(M„+Ap-1)~^D-1  -  (Mn+AC)"^c}(^S'  -  E(M|AJlp)|  ,  (G.2) 


since  Clp  =  0.    We  have,  using  (G.2), 


''\'^T-hf 


=  A  W2  {(T-^M^+AT-^p-^)   ^p"^  -  (T-^M^-AT^^C)   ^c}(^^  -  E(M|A^)lp)l  . 

(G.3) 

Now,  c~^T~2  -^  (trE)"  R,  pl^  -^  13,  E(M|A-p)  -^  E(M|^,  R),  and  the  matrix  expression 
converges  to  E(p~  -C),  a.s.  as  c  —^  0.   Hence 

1 1I  -        -     ii2 


1 

A2(trE)"  r|e(P-1-C)(^  -  E(M|^,  R)lp)|' 


a.s.  as  c  -+  0.  (G.4) 


To  obtain  an  integrable  dominating  function,  we  note 


<  KU;^%;^  -  E(M|A^)lp|^ 
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<  K  sup  (s^^/nk  +  b  -  2)     fMn^Pn  -  E(M|An)lp| 


n>no 


=  K  sup  (Sj^i/nk  +  b  -  2)     [E^Mn^Pn  -  Mlp|An 

-1    / 
<  IC  sup  (Sj^i/nk  +  b  -  2)      E    |Mn^Pn  -  Mlp| 


n>no 


An    ■ 


(G.5) 


The  fact  that  the  r.h.s.  of  (G.5)  is  integrable  follows  quickly  from  Proposition  E.2.     The 
dominated  convergence  theorem  is  thus  applicable.   It  remains  to  show  that 


A2(trS)     r[e(P-1-C)(^  -  E(M|^,  R)lp)|' 
=  A  tr|s(P"^-C)E}/tr(i;) 


'{tr(EGE)/tr(i;)}E[RV(M|^,  R)]. 


(G.5) 


The  argument  is  identical  to  that  of  Appendix  C,  involving  B_,  since  6  and  /?  have  the  same 
distributional  forms.   The  proof  is  then  complete. 


APPENDIX   H 
DETAILS   OF   THEOREM   3.4 


Here,  we  establish  that,  under  the  conditions  of  Theorem  3.1, 


c-1e 


w|- 


0-pjl 


Atr(Ep~^E)/tr(i;) 


-  A2/tr(EGE)/tr(E)}E[RV(M|^,  R)]  as  c  -^  0.  (H.l) 


Proof:    We  first  establish  pointwise  convergence  of  the  integrand.    Recall  that 


§n  =  (Mn+Ap-l)"^(Mn^S'  +  Ap-llpE(M|An)) 


-1\     ^\T^-lflW     ,     r-..      ,   xr>-l\     ^n-li 


=  K  -  (Mn+Ap-^)     Ap-Vn  +  (Mn+AP"')     Ap-^lpE(M|An).  (H.2) 


Hence 


.-'\i-r,f 


c   ^|(M^+AP"^)     AP"^^;^  -  (Mrp+AD"^)     ApipE(M|A^) 


A^c"^T"2 


-1 


T~%^+AT~^P"^)     P'^^f^T  "  E(M|A^)lp)    .  (H.3) 
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Now,      c~^T"2 


-1 


(T~%rp+AT"lp"^)    ^p  ^  ED"^ 


(trE)      R,       pl^       -^       p,       E(M|A^)       —       E(M|^,       R)      and 
a.s.  as  c  — ►  0.    It  follows  that 


11^,       ^wl|2         ,2,..^,-! 


^T-^xt  ^  ^^^^'?)     ^|?P"^(^  -  E(M|^,  R)lp)|' 


a.s.  as  c  — +  0. 


(H.4) 


A  comparison  of  (G.3)  and  (H.3)  shows  that  we  can  use  the  same  dominating  function  here  as 
there.    It  remains  to  show  that 


E 


A2(trE)     r|eD"1(^  -  E(M|/?,  R)lp)l 


=  Atr(Ep~lE)/tr(?) 


-  A2|tr(EGE)/tr(E)}E[RV(M|^,  R)]. 


(H.5) 


Again,  the  argument  follows  that  of  the  case  for  6  of  Appendix  D.     The  proof  is  then 
complete. 
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